Subgroup Identification Analysis

Applications to consistency analyses in MRCTs

Author

Larry Leon

Code
knitr::opts_chunk$set(echo = TRUE, message=FALSE, warning=FALSE)

options(warn = -1)

rm(list=ls())

library(tinytex)
library(ggplot2)
library(table1)

library(gt)

# test these packages
# just for testing any conflicts
# library(tidyverse)
# library(plyr)
# library(dplyr)
# library(glmnet)

library(survival)
library(data.table)
library(randomForest)
library(grf)
library(policytree)
library(DiagrammeR)

library(grid)
library(forestploter)
library(randomizr)

codepath <- c("Rcode/forestsearch/R/")
source(paste0(codepath,"source_forestsearch_v0.R"))
source_fs_functions(file_loc=codepath)

source("Rcode/kmplotting/R/KMplotting_functions_v0.R")

# Save output flag
output <- FALSE
# Loading results
loadit <- TRUE

# Record time of all analyses
tALL.start<-proc.time()[3]
 
#library(doParallel)
#cat("Number of cores=",c(detectCores()),"\n")

# Note: leave workers unspecified
# on my linux machine workers(=100) needs to be set
plan(multisession)
#plan(multisession, workers=120)

nsims <- 1000
nboots <- 1000

1 Summary

We evaluate the potential for utilizing subgroup analyses for enhancing regional consistency analyses in oncology multi-regional clinical trials (MRCTs). We consider a setting where asia pacific (AP) countries consist of approximately \(15-20\%\) of a global trial and the criteria for consistency is a positive point estimate (Cox model hazard-ratio \(< 1.0\)) while the global trial meets statistical significance.

Suppose the following –

ITT meets statistical significance but the AP regional analysis appears problematic (hazard-ratio estimate \(\approx 1.0\))

Can a subgroup with enhanced benefit be identified in the non-AP population which then translates to the AP population?

That is, suppose there exists a subgroup \({\cal G}\) such that the treatment affect is more pronounced and (standard) Cox model analyses based on the \({\cal G}\) subgroup of AP (AP\(({\cal G})\), say) enables meeting consistency.

We assume Cox model analyses are the primary analysis where only the treatment arm is included as a covariate (with or without stratification by randomization factors).

While there are a plethora of subgroup identification approaches available we consider the forestSearch approach, León et al. (2024), which was developed for identifying strong subgroup treatment effects (i.e., strong HTEs).

We consider three objectives: (A) Identify a subgroup with the largest detrimental effect (harm, say) defined in terms of the largest hazard-ratio estimate exceeding a clinical threshold in favor of control (e.g., \(\hat\beta \geq \log(0.90)\)); (B) Identify the smallest subgroup with a hazard-ratio estimate indicative of harm (e.g., \(\hat\beta \geq \log(0.90)\)); and (C) Identify the largest subgroup with strong benefit (e.g., \(\hat\beta \leq \log(0.60)\)). These objectives are related with the overarching goal of identifying a subgroup with enhanced benefit; In (A) and (B) we identify a subgroup indicative of harm, \(\hat{\cal H}\) say, in which case the complementary subgroup (\({\hat{\cal H}}^{c}:= {\hat{\cal G}}\)) may be generally beneficial. Whereas in (C) we are more directly targeting a strong (clinically compelling) beneficial subgroup effect.

In this note we provide a brief review of the Weibull model in Section 2 where we describe a (2-phase) spline model that we utilize for inducing differential treatment effects by way of a ‘’biomarker’‘. Section 3 provides examples of how to generate’‘biomarker profiles’’ of interest where one can define subgroups based on biomarker cutpoints which correspond to underlying causal hazard ratio effects (e.g., causal hazard-ratio effect for biomarker \(\geq 2\) is approximately \(0.5\), say). Section 4 describes the simulation setup (data generating model/process) and generates an example dataset used for analysis illustration where in Section 5 we discuss details of implementing (A)-(C) above. In Section 6 we evaluate the operating characteristics for objective (B) under conditions where there exists a strong biomarker effect, and under the null of a uniform treatment benefit (i.e., \({\cal G} = \emptyset\)). Lastly, Section 7 briefly discusses estimation (de-biased point and CI estimates) – taking into account the identification algorithm – where there is interest in inference regarding the population from which the subgroup was identified (e.g., non-AP).

Throughout this note R code is provided and illustrated based on real trial data (subject-level baseline characteristics are fixed at their observed quantities). Our hope is that these notes can serve as a framework for future trial planning as well as for simulating from study data to understand the operating characteristics for teams and external discussions.

A diagram illustrating the general flow follows.

Code
library(png)
library(grid)
img <- readPNG('MRCT-Flow.png')
grid.raster(img)

Subgroup identification analysis based on non-AP data to identify subgroups of marked benefit in AP. Operating characteristics evaluates whole process starting from analysis option

2 Weibull AFT/Cox model with biomarker effects

2.1 Brief Review

For the Weibull distribution with shape parameter \(\nu\) and scale parameter \(\theta\) the density, cdf, survival, hazard and cumulative hazard functions are: \[\begin{eqnarray*} f(t) &=& \nu \theta^{-\nu}t^{\nu-1}\exp(-(t/\theta)^{\nu}), \cr F(t) &=& \int_{0}^{t}\nu \theta^{-\nu}s^{\nu-1}\exp(-(s/\theta)^{\nu})ds \cr &=& \int_{0}^{(t/\theta)^{\nu}}e^{-w}dw \cr & & (w=(s/\theta)^{\nu},dw=\nu s^{\nu-1}\theta^{-\nu}ds) \cr &=&1-\exp(-(t/\theta)^{\nu}), \cr S(t) &=& \exp(-(t/\theta)^{\nu}), \cr \lambda(t)&=&\nu \theta^{-\nu}t^{\nu-1}, \cr \Lambda(t) &=& -\log(S(t))=(t/\theta)^{\nu}. \end{eqnarray*}\] Note that here we define the density to correspond with R’s definition. For shape parameter \(\nu \in (0,1)\) the hazard is strictly decreasing in \(t \geq 0\), whereas for \(\nu >1\) the hazard is strictly increasing in \(t \geq 0\).

Note: \(\Lambda(T) \sim E(1)\)

The cumulative hazard function \(\Lambda(\cdot)\) evaluated at \(T\), \(\Lambda(T)\), as a random variable has cdf \[\eqalign{ \Pr(\Lambda(T) \leq t) &=\Pr(-\log(1-F(T)) \leq t) =\Pr(1-F(T) \geq e^{-t}) \cr &=\Pr(T \leq F^{-1}(1-e^{-t})) =F(F^{-1}(1-e^{-t})) = 1-e^{-t}, \cr}\] which is the CDF of the exponential distribution, \(E(1)\) (say).

In the following we use

\[\begin{equation} \tag{1} \Lambda(T) \sim E(1) \end{equation}\] to represent the Weibull regression model as an AFT model which is also a Cox model.

Now, \(\Lambda(T)=(T/\theta)^{\nu}\) and write \(Q=-\log(S(T))=\Lambda(T)=(T/\theta)^{\nu}\), where from (1), \(Q \sim E(1)\). That is \(\log(Q)=\nu(\log(T)-\log(\theta))\) can be expressed as

\[\begin{equation} \tag{2} \log(T)=\log(\theta)+ \tau \log(Q) := \log(\theta) + \tau \epsilon, \end{equation}\] where \(\tau=1/\nu\) and it is easy to show that \(\epsilon=\log(Q)\) has the ``extreme value’’ distribution with density \(f_{\epsilon}(x)=\exp(x-e^{x})\) for \(x \in {\cal R}\). Here the range of \(\log(T) \in {\cal R}\) is un-restricted. The survReg routine uses the parameterization \((2)\) and therefore estimates \(\log(\theta)\) and \(\tau=1/\nu\).

To incorporate covariates \(L\) (say) we specify \[\lambda(t;L)=\big(\nu \theta^{-\nu}t^{\nu-1} \big) \exp(L'\beta) := \lambda_{0}(t)\exp(L'\beta),\] where \(\lambda_{0}(t)\) is the hazard, say, for \(T_{0} \sim \hbox{Weibull}(\nu,\theta)\). This is a special case of the proportional hazards model. The chf (conditional chf with covariate vector \(L\)) is \(\Lambda(t;Z)=(t/\theta)^{\nu}\exp(L'\beta)\) so that analogous to above this leads to the representation

\[\begin{equation} \tag{3} \log(T) =\log(\theta)+\tau[-L'\beta+\epsilon] =\log(\theta)+L'\gamma + \tau \epsilon, \end{equation}\] where \(\gamma=-\tau\beta\), with \(\tau\) and \(\epsilon\) defined in (2). R survReg uses this AFT parameterization so that the estimated components of \(\gamma\), \(\gamma_{p}\) say, are that of \(-\tau\beta_{p}\) for \(p=1,\ldots,m\) (\(m\) is dimension of \(\beta\)).

When fitting the AFT model (3) via suvreg we therefore transform parameters \(\hat\gamma\) to the Weibull hazard-ratio parameterization (2) via

\[\begin{equation} \tag{4} \hat\beta = -\hat\gamma / \hat{\tau}. \end{equation}\]

As an illustration we compare the survReg model fits for the case-study dataset. The following table below compares the Weibull survReg model fits with covariates Treat and Ecog1 (Ecog = 1 vs 0) where components of \(\hat\gamma\) from model (3) are calculated according to survReg and \(\hat\beta\) are formed via (4). In the table below Weibull estimates of \(\hat\beta\) are compared to Cox model versions.

Code
# case-study example
df.case <- read.table("simdatasource/dfexplore.csv", header=TRUE, sep=",")
#names(df.case)
Code
# Comparing Weibull vs Cox with case-study where outcomes are artifical
# This is just for illustration to show conversion of Weibull parameters from 
# AFT regression to Weibull hazard 
fit.weib_ex <- survreg(Surv(tte,pmax(event,1)) ~ treat + ecog1, dist='weibull', data=df.case)
tauhat <- fit.weib_ex$scale
# convert (treat,ecog1) regression parms to Weibull hazard-ratio
bhat.weib <- -(1)*coef(fit.weib_ex)[c(2,3)]/tauhat
# Compare to Cox 
fit.cox_ex <- coxph(Surv(tte,pmax(event,1)) ~ treat + ecog1, data=df.case)
res <- cbind(bhat.weib,coef(fit.cox_ex))
res <- as.data.frame(res)
colnames(res)<-c("Weibull","Cox")
res |> gt() |>
fmt_number(columns=1:2,decimals=6) |>
tab_header(title="Comparing Weibull to Cox hazard ratio estimates",
subtitle="Case-study dataset with artificial outcomes")
Comparing Weibull to Cox hazard ratio estimates
Case-study dataset with artificial outcomes
Weibull Cox
0.157144 0.176033
0.472355 0.472196

2.2 Biomarker effects with spline model

We now outline how potential outcomes are simulated according to parameters fit to the case-study dataset but with parameters specified to induce biomarker effects. That is, causal treatment effects (on log(hazard-ratio) scale) that follow a spline model according to patterns where biomarker effects increase with biomarker levels; Including various degrees of limited treatment effects for low biomarker levels.

We first consider a Weibull model with treatment and a single biomarker covariate \(Z\) where we write the linear predictor of the Cox model \(L'\beta\) (say) as

\[\begin{equation} \tag{5} L'\beta := \beta_{1}\hbox{Treat} + \beta_{2}\hbox{Z} + \beta_{3}\hbox{Z}\hbox{Treat} + \beta_{4}(\hbox{Z}-k)I(\hbox{Z}>k) + \beta_{5}(\hbox{Z}-k)I(\hbox{Z}>k)\hbox{Treat}. \end{equation}\]

Following the potential-outcome approach let \(l_{x,z}\) denote subject’s hazard-function “had they followed treatment regimen \(Treat=x\) while having biomarker level \(Z=z\)”. That is, for subject with biomarker level \(Z=z\) we can simulate their survival outcomes under both treatment (\(x=1\)) and control (\(x=0\)) conditions. Let \(\beta^{0} = (\beta_{1}^{0},\ldots,\beta_{5}^{0})'\) denote the true coefficients and denote the hazard function as

\[\lambda_{x,z}(t) = \lambda_{0}(t)\exp(l_{x,z}), \quad \hbox{say}.\]

Writing

\[\begin{equation} l_{x,z} = \beta^{0}_{1}x + \beta^{0}_{2}z + \beta^{0}_{3}zx + \beta^{0}_{4}(z-k)I(z>k) + \beta^{0}_{5}(z-k)I(z>k)x, \end{equation}\] the log of the hazard ratio for biomarker level \(z\) under treatment (\(x=1\)) relative to control (\(x=0\)) is given by

\[\begin{equation} \tag{6} \psi^{0}(z) := \log(\lambda_{1,z}(t)/\lambda_{0,z}(t)) = \beta^{0}_{1} + \beta^{0}_{3}z + \beta^{0}_{5}(z-k)I(z>k). \end{equation}\]

2.3 Causal log-hazard-ratio

The log(hazard-ratio) for biomarker level \(z\) is a linear function of \(z\) with a change-point (in slope) at \(z=k\) given by

\[\begin{eqnarray*} \psi^{0}(z) &=& \beta^{0}_{1} + \beta^{0}_{3}z, \quad \hbox{for} \ z \leq k, \cr &=& \beta^{0}_{1} + \beta^{0}_{3}z + \beta^{0}_{5}(z-k), \quad \hbox{for} \ z > k. \end{eqnarray*}\]

Log hazard-ratio parameters \((\beta^{0}_{1},\beta^{0}_{3},\beta^{0}_{5})\) can be chosen to generate “treatment effect patterns” by specifying \(\psi^{0}(z)\) values at \(z=0\), \(z=k\), and \(z=\zeta\) for \(\zeta > k\). For specified \(\psi^{0}(0)\), \(\psi^{0}(k)\), and \(\psi^{0}(\zeta)\) we have

\[\begin{eqnarray} \beta^{0}_{1} &=& \psi^{0}(0), \cr \beta^{0}_{3} &=& {(\psi^{0}(k) - \beta^{0}_{1}) \over k}, \cr \beta^{0}_{5} &=& {(\psi^{0}(\zeta) - \beta^{0}_{1} - \beta^{0}_{3}\zeta) \over (\zeta -k)}. \end{eqnarray}\]

The function get_dgm_stratified generates \(\psi^{0}(z)\) according to desired “biomarker treatment effect patterns” as follows.

  • Let \(X\) and \(Z\) denote the treatment and biomarker variables in the case-study dataset and for specified \(k\), form the covariates \(L:=(X,Z,ZX,(Z-k)I(Z>k),(Z-k)I(Z>k)X))\);
  • Fit the Weibull model (recall on AFT scale) to get \(\log(\hat\theta)\), \(\hat\tau=1/\hat{\nu}\), and \(\hat\gamma\) corresponding to \(L\);
  • \(\hat\gamma\) is in terms of the AFT parameterization given by model (3)
  • Next transform to the Weibull (Cox) log hazard-ratio parameterization (4): \(\hat\beta = -\hat\gamma/\hat\tau\)
  • Set “true” parameters \(\theta^{0}=\hat\theta\), and \(\tau^{0}=\hat\tau\)
  • Initialize the “true” parameter \(\beta^{0} = \hat\beta\) and re-define parameters 1, 3, and 5 in order to satisfy specified \(\psi^{0}(0)\), \(\psi^{0}(k)\), and \(\psi^{0}(\zeta)\): \(\beta^{0}[1] = \psi^{0}(0)\), \(\beta^{0}[3] = (\psi^{0}(k) - \beta^{0}[1])/k\), and \(\beta^{0}[5] = (\psi^{0}(\zeta) - \beta^{0}[1] - \beta^{0}[3]\zeta)/(\zeta -k)\);
  • Form corresponding \(\gamma^{0}= -\beta^{0}\tau^{0}\)
  • For simulations we use the AFT parameterization (3) to generate \(\log(T)\) outcomes according to \(\log(T) = \log(\theta^{0}) + L'\gamma^{0} + \tau^{0}\epsilon\) where recall \(\epsilon\) has the ``extreme value’’ distribution.

The inputs of the get_dgm_stratified function are:

  • The case-study dataset (“df”)
  • “knot”=\(k\), “zeta”=\(\zeta\), and “log.hrs”= (\(\psi^{0}(0),\psi^{0}(k),\psi^{0}(\zeta))\)
  • Note that get_dgm_stratified allows for outcomes to follow a stratified Weibull (Cox) models in which case the log(hazard-ratio) effects will depend on the strata, “strata_tte”, where the default is non-stratified (“strata_tte=NULL”)
  • If stratified the \(\tau\) parameters and hence \(\gamma^{0}\) depend on the strata
  • If stratified, then to simplify, the \(\gamma^{0}\) effects are calculated based on the median of the \(\tau^{0}\)’s (\(\tau^{0}=\tau_{med}\), say).

To-do –> explain outputs and how used in draw_sim_stratified

2.4 Example where treatment effects increase with increasing biomarker

We assume that \(z=0\) indicates biomarker negative values with \(z>1\) indicating positive levels. As an example, suppose the log(hazard ratio) at \(z=0\) is \(\psi^{0}(0)=\log(3)\) and decreases linearly for \(z \leq 5\) (with slope \(\beta_{3}^{0}\)) such that at \(z=5\), \(\psi^{0}(5)=\log(1.25)\) and for \(z>5\) decreases linearly (with a change in slope and intercept) such that at \(z=10\), \(\psi^{0}(10) = \log(0.5)\). In the following we will describe summary measures of the treatment effects as a function of increasing [or decreasing] values of the biomarker. In this example treatment is detrimental for lower values of the biomarker with treatment effects increasing fairly quickly as the biomarker increases with an overall ``average hazard ratio’’ (AHR) of \(\approx 0.74\) (see Working Example below).

2.5 Including prognostic factors \(W\)

We extend the model to include a baseline prognostic factor \(W\) within \(L\) (effect parameter \(\beta^{0}_{w}\)) where

\[\begin{equation} l_{x,z,w} = \beta^{0}_{1}x + \beta^{0}_{2}z + \beta^{0}_{3}zx + \beta^{0}_{4}(z-k)I(z>k) + \beta^{0}_{5}(z-k)I(z>k)x + \beta^{0}_{w}w. \end{equation}\]

Note that defining \(m(1,z,w)\) [\(m(0,z,w)\))] as the conditional expected value of \(\log(T)\) with treatment set to experimental [control] given biomarker level \(Z=z\) and \(W=w\)

\[\psi^{0}(z,w) = \{m(0,z,w)-m(1,z,w)\}/\tau\] which corresponds to the difference in the (true) means of the potential outcomes under experimental and control conditions (\(\log(T[0,z,w]\) versus \(\log(T[1,z,w]\), say)

3 Specifying biomarker treatment effects

3.1 Average hazard ratios (AHRs)

We define the biomarker average hazard ratio (AHR) as the expected value of \(\psi^{0}(\cdot)\) across “increasing biomarker” sub-populations. For example, \(\hbox{AHR}(2^{+})\) represents the AHR for subjects with biomarker values \(\geq 2\) via

\[\begin{equation} \tag{6} \hbox{AHR}(z^{+}):= \exp\left\{E_{Z \geq z} \psi^{0}(Z) \right\}; \end{equation}\]

With the opposite direction – across “decreasing biomarker” sub-populations

\[\begin{equation} \tag{7} \hbox{AHR}(z^{-}):= \exp\left\{E_{Z \leq z} \psi^{0}(Z) \right\}. \end{equation}\]

Here the expectations are over the distribution of \(Z\), however if there is a prognostic factor \(W\) then the expectations are over \((Z,W)\). In our calculations we calculate empirical averages.

3.2 Controlled direct-effect (CDE) versions (Aalen, Cook, and Røysland (2015))

Averaging across hazard ratios: Define the hazard (omitting baseline hazard) \(\theta^{x}(z,w) = \exp(l(x,z,w))\) setting treatment to \(X=x\) given \(Z=z\) and \(W=w\). Aalen et al. (Aalen, Cook, and Røysland (2015)) define the controlled direct-effect (CDE) as the ratio of the expected hazards. Here we consider the above cumulative versions (omitting possible dependence on \(W\)). Let \(\bar\theta^{x}(z+) = E_{Z \geq z}\theta^{x}(Z)\) for \(x=0,1\), and define

\[\hbox{CDE}(z^{+}):= \bar\theta^{1}(z+)/\bar\theta^{0}(z+), \quad \hbox{and} \quad \hbox{CDE}(z^{-}):= \bar\theta^{1}(z-)/\bar\theta^{0}(z-)\]

Code
df.case <- within(df.case,{
  z <- pmin(biomarker,20)  # eg, PDL-1 expression truncated at 20
  CTregimen <- ifelse(mAD_CTregimen1==0,"socA","socB") # Outcome stratification (Weibull/Cox model)
  })

# These are confounders (fixed) that were characteristics from observed data
confounders.name <- c("age","biomarker","male","EU_region","AP_region","histology","surgery","ecog1","mAD_CTregimen1")

3.3 Compare ‘AHR’ versions under various biomarker specifications

3.3.1 Under PH (uniform effects, hr=0.7), with strong prognostic effect \(\beta_{w}= -\log(5)\)

Code
# Outcome model is NON-stratified (strata_tte=NULL)
# PH 

log.hrs <- log(c(0.7,0.7,0.7))

dgm <- get_dgm_stratified(df=df.case,zeta=10,knot=5,log.hrs=log.hrs,strata_tte=NULL,details=FALSE)

# Simulate with randomization factors "stratum" 
# Note that strata_tte="CTregimen" is a simplification (collapsing over region and metastatic) of "stratum"
# xname="AP_region" specifies prognostic effect factor
# bx=log(5) denotes log(hazard ratio) relative effect with respect to xname factor

# Draw very large-sample to get approximation to bias
# Note: return_df will NOT return the large simulated dataset but the "large-sample" estimates as a list
# checking=TRUE will compare the estimated tau (stratified) parameters based on the case-study dataset to 
# the versions based on the simulated dataset
# details=TRUE will output the approximations to Cox model estimators from several models ("population summaries")
# return_df =FALSE will return the "population summaries" as a list as well as the biomarker AHR functional (see below)

pop_summary1 <- draw_sim_stratified(dgm=dgm,ss=1,Ndraw=100000,wname="AP_region",bw=-log(5),strata_rand="stratum",checking=TRUE,details=TRUE,return_df=FALSE)
% censored = 0.18077 
Stratification parm (taus) df_super 0.9345439 
Stratification parm (taus) simulated= 0.9314784 
Max |loghr.po - (log.Y0-log.Y1)/tau| =  0 
Overall empirical AHR, CDE= 0.7 0.7 
AHR W=1, W=0 0.7 0.7 
CDE W=1, W=0 0.7 0.7 
Cox ITT: Un-adjusted, sR, W 0.7619259 0.7059839 0.7055706 
Cox W=1 Sub-population 0.7173293 
Cox W=0 Sub-population 0.7041526 

Code
plot_AHRs(popsummary=pop_summary1,dfcase=df.case)

3.3.2 Under ‘’moderate’’ biomarker effects

Code
# Moderate biomarker
log.hrs <- log(c(0.9,0.8,0.65))
dgm <- get_dgm_stratified(df=df.case,zeta=10,knot=5,log.hrs=log.hrs,strata_tte=NULL,details=FALSE)
# Strong prognostic factor 
pop_summary5 <- draw_sim_stratified(dgm=dgm,ss=1,Ndraw=100000,wname="AP_region",bw=-log(5),strata_rand="stratum",checking=TRUE,details=TRUE,return_df=FALSE)
% censored = 0.18556 
Stratification parm (taus) df_super 0.9345439 
Stratification parm (taus) simulated= 0.935636 
Max |loghr.po - (log.Y0-log.Y1)/tau| =  0 
Overall empirical AHR, CDE= 0.6944291 0.7263561 
AHR W=1, W=0 0.6924893 0.6950588 
CDE W=1, W=0 0.7231901 0.7265654 
Cox ITT: Un-adjusted, sR, W 0.7530367 0.6971935 0.6955531 
Cox W=1 Sub-population 0.7247699 
Cox W=0 Sub-population 0.692078 

Code
plot_AHRs(popsummary=pop_summary5,dfcase=df.case)

3.4 Calculating causal ‘AHR’ effects for subgroups

To evaluate the estimation properties of our procedures we calculate the empirical average within subgroup \(\cal G\) (say) which can represent the estimated subgroup or can be the ITT population (or any underlying subgroup). That is, for each subject with characteristics \({\bf L}=(Z,W)\) say, we have their individual log hazard-ratio contrast \(\psi^{0}({\bf L})\); We estimate \(\hbox{AHR}({\cal G}):= \exp\left\{E_{\cal G} \psi^{0}\right\}\) by calculating the empirical average

\[\begin{equation} \tag{8} \hbox{AHR}({\cal G}):= \exp(\bar\beta({\cal G})), \quad \hbox{where} \quad \bar\beta({\cal G}) =(1/n({\cal G}))\sum_{g \in {\cal G}} \psi^{0}({\bf L}_{g}), \end{equation}\] with \(n({\cal G})\) denoting the size of \({\cal G}\).

4 Working example for subgroup identification strategies

In the remainder we will be evaluating the performance of subgroup identification procedures when the underlying biomarker profile (``differential treatment effect mechanism’’) is quite strong and challenges with establishing consistency within the AP region are induced by a strong prognostic effect (\(\beta_{w}=-\log(5)\)).

Note that in the following we draw a very large sample (\(n = 100,000\)) to approximate the asymptotic properties of Cox model analyses based on ITT and AP region subgroups (non-AP is denoted \(W=0\), and AP is \(W=1\)). The code outputs the overall AHR which represent (8) applied to the ITT population (i.e., \(\cal G\)=ITT) as well as (8) applied to non-AP and AP subgroups (“AHR W=1”, and “AHR W=0”, resp.).

Code
# Outcome model is NON-stratified (strata_tte=NULL)
# Strong biomarker 

log.hrs <- log(c(3,1.25,0.50))

dgm <- get_dgm_stratified(df=df.case,zeta=10,knot=5,log.hrs=log.hrs,strata_tte=NULL,details=FALSE)

# Simulate with randomization factors "stratum" 
# Note that strata_tte="CTregimen" is a simplification (collapsing over region and metastatic) of "stratum"
# xname="AP_region" specifies prognostic effect factor
# bx=log(5) denotes log(hazard ratio) relative effect with respect to xname factor

# Draw very large-sample to get approximation to bias
# Note: return_df will NOT return the large simulated dataset but the "large-sample" estimates as a list
# checking=TRUE will compare the estimated tau (stratified) parameters based on the case-study dataset to 
# the versions based on the simulated dataset
# details=TRUE will output the approximations to Cox model estimators from several models ("population summaries")
# return_df =FALSE will return the "population summaries" as a list as well as the biomarker AHR functional (see below)

# Strong prognostic factor 
pop_summary <- draw_sim_stratified(dgm=dgm,ss=1,Ndraw=100000,wname="AP_region",bw=-log(5),strata_rand="stratum",checking=TRUE,details=TRUE,return_df=FALSE)
% censored = 0.20918 
Stratification parm (taus) df_super 0.9345439 
Stratification parm (taus) simulated= 1.013666 
Max |loghr.po - (log.Y0-log.Y1)/tau| =  0 
Overall empirical AHR, CDE= 0.7367854 1.258119 
AHR W=1, W=0 0.7206492 0.7420908 
CDE W=1, W=0 1.197564 1.262121 
Cox ITT: Un-adjusted, sR, W 0.7903797 0.7641313 0.7457578 
Cox W=1 Sub-population 1.015991 
Cox W=0 Sub-population 0.718739 

Code
# An illustrative simulated example dataset
df_example <- draw_sim_stratified(dgm=dgm,ss=123,wname="AP_region",bw=-log(5),strata_rand="stratum")

# Subgroups identified with nonAP population
df_nonAP <- subset(df_example,AP_region==0)
# Applied to AP
df_AP <- subset(df_example,AP_region==1)
df_ITT <- df_example


# True causal AHR
ahr_true = c(round(pop_summary$AHR,4))
# AHR Cox un-adjusted (only treatment)
ahr_unadj = c(round(pop_summary$ITT_unadj,4))
# Treatment plus randomization stratificaton sR
ahr_sR = c(round(pop_summary$ITT_sR,4))
# sR including adjustment by x
ahr_sRw = c(round(pop_summary$ITT_sRw,4))

# Relative biases 
bias_unadj <-round(100*c(pop_summary$ITT_unadj-pop_summary$AHR)/pop_summary$AHR,1)
bias_sR <-round(100*c(pop_summary$ITT_sR-pop_summary$AHR)/pop_summary$AHR,1)
bias_sRw <-round(100*c(pop_summary$ITT_sRw-pop_summary$AHR)/pop_summary$AHR,1)

# The bias within X=1 population
ahr_w1 = c(round(pop_summary$W_1,4))
bias_w1 <-round(100*c(pop_summary$W_1-pop_summary$AHR)/pop_summary$AHR,1)
Code
plot_AHRs(popsummary=pop_summary,dfcase=df.case)

  • Overall Population (large-sample) –
    • The overall true (causal) average hazard-ratio (AHR) = 0.7368
    • The \(\approx\) asymptotic limit of un-adjusted Cox (only treatment) = 0.7904
    • The \(\approx\) asymptotic limit of stratified Cox (treatment and stratified by stratum) = 0.7641
    • The \(\approx\) asymptotic limit of stratified Cox with adjustment by \(x\) = 0.7458

The relative over-estimation biases for the Cox model estimates when un-adjusted, stratified (by randomization factors), and stratified + adjusted by the strong prognostic factor \(X\) are: 7.3%, 3.7%, and 1.2%, respectively.

4.1 Dilution of marginal Cox model within AP region

Moreover, within the AP_region (\(X=1\)) the \(\approx\) limit of the Cox model estimator based on the (AP_region) regional local data is 1.016 with corresponding bias of 37.9%.

We would therefore expect challenges with establishing consistency based on evaluating the local regional data.

Lets check if there is any imbalance (asymptotically) by biomarker by drawing a large sample (N=100,000) and viewing summary tables:

Code
# This time return large dataset (corresponding to that used in the above asymptotic approximations)
dflarge <- draw_sim_stratified(dgm=dgm,ss=1,Ndraw=100000,wname="AP_region",bw=-log(5),strata_rand="stratum",checking=FALSE,details=FALSE,return_df=TRUE)
# create factor versions of treatment and AP region
dflarge$trtsim <- ifelse(dflarge$treat.sim==1,"Experimental","Control")
dflarge$apregion <- ifelse(dflarge$AP_region==1,"AP","non-AP")
dflarge$bm_lt2 <- ifelse(dflarge$biomarker<2,"biomarker<2","biomarker>=2")
table1( ~ biomarker + bm_lt2 | trtsim, data=dflarge, caption=c("ITT by treatment arm"))
ITT by treatment arm
Control
(N=50001)
Experimental
(N=49999)
Overall
(N=100000)
biomarker
Mean (SD) 7.83 (7.06) 7.82 (7.07) 7.82 (7.07)
Median [Min, Max] 5.00 [0, 20.0] 5.00 [0, 20.0] 5.00 [0, 20.0]
bm_lt2
biomarker<2 11930 (23.9%) 11990 (24.0%) 23920 (23.9%)
biomarker>=2 38071 (76.1%) 38009 (76.0%) 76080 (76.1%)
Code
table1( ~ biomarker + bm_lt2 | apregion, data=dflarge, caption=c("ITT by AP region"))
ITT by AP region
AP
(N=24472)
non-AP
(N=75528)
Overall
(N=100000)
biomarker
Mean (SD) 7.95 (6.87) 7.78 (7.13) 7.82 (7.07)
Median [Min, Max] 5.00 [1.00, 20.0] 5.00 [0, 20.0] 5.00 [0, 20.0]
bm_lt2
biomarker<2 4922 (20.1%) 18998 (25.2%) 23920 (23.9%)
biomarker>=2 19550 (79.9%) 56530 (74.8%) 76080 (76.1%)
Code
table1( ~ biomarker + bm_lt2 | trtsim, data=subset(dflarge,AP_region==1), caption=c("AP region by treatment arm"))
AP region by treatment arm
Control
(N=12237)
Experimental
(N=12235)
Overall
(N=24472)
biomarker
Mean (SD) 7.96 (6.87) 7.94 (6.88) 7.95 (6.87)
Median [Min, Max] 5.00 [1.00, 20.0] 5.00 [1.00, 20.0] 5.00 [1.00, 20.0]
bm_lt2
biomarker<2 2442 (20.0%) 2480 (20.3%) 4922 (20.1%)
biomarker>=2 9795 (80.0%) 9755 (79.7%) 19550 (79.9%)
Code
rm("dflarge")

Next, a simulated example with same sample size as the case-study

  • Here we simulated a study according to the biomarker effects described above
  • Summarize the non-AP and AP populations
  • Apply subgroup identification procedures to the non-AP data and apply to AP
Code
# non-stratified
kmH.fit<-KM.plot.2sample.weighted(df=df_example, 
tte.name="y.sim", event.name="event.sim", treat.name="treat.sim",
risk.set=TRUE, by.risk=6,
details=TRUE,
Xlab="Months",Ylab="Overall survival",
arms=c("Experimental","Control"))
Cox un-adjusted HR= 0.76 
Cox CIs= 0.624 0.925 
My p-value and survdiff= 0.006076202 0.006076202 
My z^2 and survdiff= 7.527564 7.527564 
Comparing with R's survfit 
Treat=1: Max error max|KM(survfit)[t]-KM(mine)[t]| = 0 
Treat=0: Max error max|KM(survfit)[t]-KM(mine)[t]| = 0 
Comparing with R's survfit (experimental 
Treat=1: n,events= 252 185 
Median, Lower, Upper= 17.19713 14.14832 21.0917 
survfit medians 
  records     n.max   n.start    events     rmean se(rmean)    median   0.95LCL 
   252.00    252.00    252.00    185.00     27.33      1.55     17.20     14.15 
  0.95UCL 
    21.09 
Comparing with R's survfit (Control 
Treat=0: n,events= 255 217 
Median, Lower, Upper= 13.42839 11.46187 17.93717 
survfit medians 
  records     n.max   n.start    events     rmean se(rmean)    median   0.95LCL 
   255.00    255.00    255.00    217.00     21.64      1.29     13.43     11.46 
  0.95UCL 
    17.94 
Code
title(main="ITT Population")

4.2 Now evaluate non-AP and AP —> AP is “diluted”

Code
# non-stratified
par(mfrow=c(1,2))
kmH.fit<-KM.plot.2sample.weighted(df=df_nonAP, 
tte.name="y.sim", event.name="event.sim", treat.name="treat.sim",
risk.set=TRUE, by.risk=6, details=FALSE,
show.logrank=FALSE, put.legend.arms="top",
Xlab="Months",Ylab="Overall survival",
arms=c("Experimental","Control"))
title(main="Non-AP Population")

kmH.fit<-KM.plot.2sample.weighted(df=df_AP, 
tte.name="y.sim", event.name="event.sim", treat.name="treat.sim",
risk.set=TRUE, by.risk=6, details=FALSE,
show.logrank=FALSE, put.legend.arms="top",
Xlab="Months",Ylab="Overall survival",
arms=c("Experimental","Control"))
title(main="AP Population")

Note that results stratified by randomization (or true outcome strata) are similar

Code
# stratified
par(mfrow=c(1,2))
kmH.fit<-KM.plot.2sample.weighted(df=df_nonAP, 
tte.name="y.sim", event.name="event.sim", treat.name="treat.sim",strata.name="strata.simR",
risk.set=TRUE, by.risk=6, details=FALSE,
show.logrank=FALSE, put.legend.arms="top",
Xlab="Months",Ylab="Overall survival",
arms=c("Experimental","Control"))
title(main="Non-AP Population")

kmH.fit<-KM.plot.2sample.weighted(df=df_AP, 
tte.name="y.sim", event.name="event.sim", treat.name="treat.sim",strata.name="strata.simR",
risk.set=TRUE, by.risk=6, details=FALSE,
show.logrank=FALSE, put.legend.arms="top",
Xlab="Months",Ylab="Overall survival",
arms=c("Experimental","Control"))
title(main="AP Population")

Check balance within simulated trial

Code
# This time return large dataset (corresponding to that used in the above asymptotic approximations)
df <- df_example
# create factor versions of treatment and AP region
df$trtsim <- ifelse(df$treat.sim==1,"Experimental","Control")
df$apregion <- ifelse(df$AP_region==1,"AP","non-AP")
df$bm_lt2 <- ifelse(df$biomarker<2,"biomarker<2","biomarker>=2")
table1( ~ biomarker + bm_lt2 | trtsim, data=df, caption=c("ITT by treatment arm"))
ITT by treatment arm
Control
(N=255)
Experimental
(N=252)
Overall
(N=507)
biomarker
Mean (SD) 8.29 (7.45) 7.35 (6.63) 7.82 (7.06)
Median [Min, Max] 5.00 [0, 20.0] 5.00 [1.00, 20.0] 5.00 [0, 20.0]
bm_lt2
biomarker<2 59 (23.1%) 61 (24.2%) 120 (23.7%)
biomarker>=2 196 (76.9%) 191 (75.8%) 387 (76.3%)
Code
table1( ~ biomarker + bm_lt2 | apregion, data=df, caption=c("ITT by AP region"))
ITT by AP region
AP
(N=125)
non-AP
(N=382)
Overall
(N=507)
biomarker
Mean (SD) 7.88 (6.87) 7.81 (7.13) 7.82 (7.06)
Median [Min, Max] 5.00 [1.00, 20.0] 5.00 [0, 20.0] 5.00 [0, 20.0]
bm_lt2
biomarker<2 25 (20.0%) 95 (24.9%) 120 (23.7%)
biomarker>=2 100 (80.0%) 287 (75.1%) 387 (76.3%)
Code
table1( ~ biomarker + bm_lt2 | trtsim, data=subset(df,AP_region==1), caption=c("AP region by treatment arm"))
AP region by treatment arm
Control
(N=62)
Experimental
(N=63)
Overall
(N=125)
biomarker
Mean (SD) 8.42 (7.13) 7.35 (6.62) 7.88 (6.87)
Median [Min, Max] 5.00 [1.00, 20.0] 5.00 [1.00, 20.0] 5.00 [1.00, 20.0]
bm_lt2
biomarker<2 10 (16.1%) 15 (23.8%) 25 (20.0%)
biomarker>=2 52 (83.9%) 48 (76.2%) 100 (80.0%)

4.3 Looking at 9 simulated AP trials when AP region is a strong factor (\(\beta_{w} = -\log(5)\)) and null (\(\beta_{w}=0\))

4.3.1 Nine simulated AP realizations under \(\beta_{w}= -\log(5)\)

Code
par(mfrow=c(3,3))

for(ss in 1:9){
df_ex <- draw_sim_stratified(dgm=dgm,ss=ss,wname="AP_region",bw=-log(5),strata_rand="stratum")
kmH.fit<-KM.plot.2sample.weighted(df=subset(df_ex,AP_region==1), 
tte.name="y.sim", event.name="event.sim", treat.name="treat.sim",strata.name="strata.simR",
risk.set=TRUE, by.risk=6, details=FALSE, show.med=FALSE, cox.cex= 1.25,
show.logrank=FALSE, put.legend.arms="top",
Xlab="Months",Ylab="Overall survival",
arms=c("Experimental","Control"))
title(main="AP Population")
}

4.3.2 Nine simulated AP realizations under \(\beta_{w} = 0\)

Code
par(mfrow=c(3,3))

for(ss in 1:9){
df_ex <- draw_sim_stratified(dgm=dgm,ss=ss,wname="AP_region",bw=-log(1),strata_rand="stratum")
kmH.fit<-KM.plot.2sample.weighted(df=subset(df_ex,AP_region==1), 
tte.name="y.sim", event.name="event.sim", treat.name="treat.sim",strata.name="strata.simR",
risk.set=TRUE, by.risk=6, details=FALSE,show.med=FALSE, cox.cex= 1.25,
show.logrank=FALSE, put.legend.arms="top",
Xlab="Months",Ylab="Overall survival",
arms=c("Experimental","Control"))
title(main="AP Population")
}

4.4 Compare non-parametric (cubic-spline) fit with true \(\psi^{0}\)

For ITT dataset

Code
# ydel (default=0.25) is how much room to provide in space for counts
temp <- cox_cs_fit2(df=df_example,tte.name="y.sim",event.name="event.sim",treat.name="treat.sim",
strata.name="strata.simO",z.name=c("z"), details=FALSE ,boots=0, xlab=c("Biomarker"),maxz = 25, ydel=0.5, show_plot=TRUE,truebeta.name="loghr.po")
title("ITT")

Within non-AP and AP regions

Code
par(mfrow=c(1,2))

temp <- cox_cs_fit(df=df_nonAP,tte.name="y.sim",event.name="event.sim",treat.name="treat.sim",
strata.name="strata.simO",z.name=c("z"), details=FALSE ,boots=0, xlab=c("Biomarker"),maxz = 25, ydel=0.5, show_plot=TRUE,truebeta.name="loghr.po")
title("Non-AP")


temp <- cox_cs_fit(df=df_AP,tte.name="y.sim",event.name="event.sim",treat.name="treat.sim",
strata.name="strata.simO",z.name=c("z"), details=FALSE ,boots=0, xlab=c("Biomarker"),maxz = 25, ydel=0.5, show_plot=TRUE,truebeta.name="loghr.po")
title("AP")

4.5 Examine basic forest plot of non-AP data used for identification

Code
fp <- forestplot_baseline(df=df_nonAP,confounders.name=confounders.name,
arrow_text=c("favors Treatment","Control"),E.name=c("Treatment"),C.name=c("Control"),outcome.name="y.sim",event.name="event.sim",treat.name="treat.sim",
xlim = c(0.25, 2.0), ticks_at = c(0.65, 1.0, 1.3))

plot(fp$p)

Code
aa <- paste(confounders.name, collapse=" + ")
aa <- paste0("~ " ,aa)
aa <- paste0(aa," | treat_name")
aa <- as.formula(aa)

table1( aa , data=df_nonAP, caption=c("Simulated dataset: non-AP (training data)"))
Simulated dataset: non-AP (training data)
Control
(N=189)
Experimental
(N=193)
Overall
(N=382)
age
Mean (SD) 59.2 (13.0) 60.0 (11.6) 59.6 (12.3)
Median [Min, Max] 62.0 [23.0, 87.0] 61.0 [22.0, 83.0] 61.0 [22.0, 87.0]
biomarker
Mean (SD) 7.34 (6.98) 8.26 (7.27) 7.81 (7.13)
Median [Min, Max] 5.00 [1.00, 20.0] 5.00 [0, 20.0] 5.00 [0, 20.0]
male
Mean (SD) 0.730 (0.445) 0.751 (0.433) 0.741 (0.439)
Median [Min, Max] 1.00 [0, 1.00] 1.00 [0, 1.00] 1.00 [0, 1.00]
EU_region
Mean (SD) 0.481 (0.501) 0.420 (0.495) 0.450 (0.498)
Median [Min, Max] 0 [0, 1.00] 0 [0, 1.00] 0 [0, 1.00]
AP_region
Mean (SD) 0 (0) 0 (0) 0 (0)
Median [Min, Max] 0 [0, 0] 0 [0, 0] 0 [0, 0]
histology
Mean (SD) 0.344 (0.476) 0.404 (0.492) 0.374 (0.485)
Median [Min, Max] 0 [0, 1.00] 0 [0, 1.00] 0 [0, 1.00]
surgery
Mean (SD) 0.233 (0.424) 0.264 (0.442) 0.249 (0.433)
Median [Min, Max] 0 [0, 1.00] 0 [0, 1.00] 0 [0, 1.00]
ecog1
Mean (SD) 0.587 (0.494) 0.611 (0.489) 0.599 (0.491)
Median [Min, Max] 1.00 [0, 1.00] 1.00 [0, 1.00] 1.00 [0, 1.00]
mAD_CTregimen1
Mean (SD) 0.476 (0.501) 0.487 (0.501) 0.482 (0.500)
Median [Min, Max] 0 [0, 1.00] 0 [0, 1.00] 0 [0, 1.00]

5 Subgroup identification criteria: Options for targeting subgroup effects

5.1 Finding Non-AP subgroup with highest consistency rate (in favor of control)

In the following we evaluate subgroups based on single-factors (e.g., “biomarker < J” say, or “Age <= 65”, etc) and select the subgroup with the highest consistency rate with Cox hazard-ratio estimate meeting selection criteria; The selected subgroup with the highest consistency rate meeting hazard ratio estimate criterion (log hazard-ratio estimate \(\log(\hat\beta) \geq \log(0.90)\), say) where the “consistency rate” is at least \(90\)%.

Here we are seeking to identify subgroups where the hazard ratio estimate is at least \(\log(0.90)\) which may indicate limited benefit. Further, subgroup candidates are ranked according to highest hazard ratio estimate and highest consistency rate.

This is an option, and we discuss alternatives below

To do —> Refer criterion (hazard ratio thresholds) to equations in paper León et al. (2024)

Code
# Show the first (up to) 10 subgroups ("showten_subgroups=TRUE") meeting identification criteria 
# Subgroups based on only single-factors ("maxk=1")

# Note that sg_focus= "hr" sorts the subgroups by consistency rate and largest hazard-ratio estimates  
# Selects the subgroup with the largest consistency rate;  If there are ties among consistency rates, then 
# the subgroup with largest HR estimate is selected estimate satisfying the (minimum) consisency criteria 
# (pconsistency.threshold)

# Note: we exclude cuts for biomarker which are "<=" in order to only consider "<" 
fs <- forestsearch(df.analysis=df_nonAP, df.test=df_AP, confounders.name=confounders.name,
outcome.name="y.sim",treat.name="treat.sim", event.name="event.sim",
potentialOutcome.name="loghr.po",
hr.threshold = 0.90, hr.consistency = 0.80, pconsistency.threshold = 0.90,
sg_focus = "hr",
showten_subgroups=TRUE, details=TRUE,
conf_force=c("age <= 65","biomarker < 1","biomarker < 2","biomarker < 3",
"biomarker < 4","biomarker < 5","biomarker < 6","biomarker < 7","biomarker < 8",
"biomarker < 9","biomarker < 10"), exclude_cuts = c("biomarker <="),
maxk=1, plot.sg=FALSE)

if(output) save(fs,file="output/sim3_k1tenhr.Rdata")

Display the first 10 subgroups and Kaplan-Meier plots within the estimated subgroups –

Notice that the first 10 subgroups are sorted by decreasing HR estimates and the selected subgroup corresponds to the highest hazard ratio estimate.

Code
if(loadit) load(file="output/sim3_k1tenhr.Rdata")
# Consistency for subgroups contained in "grp.consistency"
sg_tables(fs=fs,which_df="est",est_caption="Non-AP (training) estimates")$sg10_out
Subgroups formed by single-factors
maxk=1
M.1 N E hr Pcons
{biomarker < 5} 149 148 1.658330 1.000000
{biomarker < 4} 148 147 1.643559 1.000000
{biomarker < 3} 135 134 1.584584 1.000000
{biomarker < 6} 219 218 1.570647 1.000000
{biomarker < 7} 235 233 1.563982 1.000000
{biomarker < 8} 238 236 1.537168 1.000000
{biomarker < 10} 241 239 1.456963 1.000000
{biomarker < 2} 95 94 1.655311 0.998000
Code
#sg_tables(fs=fs,which_df="est",est_caption="Non-AP (training) estimates",potentialOutcome.name="loghr.po")$tab_estimates

# Show detailed estimates for the next analysis
id_harm <- paste(fs$sg.harm,collapse=" & ")

The selected subgroup is {biomarker < 5}

Next plot the K-M curves within the estimated non-AP (training) subgroups.

Code
plot_found.subgroups(fs=fs, df=fs$df.est, title_itt=c("Non-AP ITT"))

5.2 Finding Non-AP subgroup with smallest sample size meeting selection criteria

We now select the smallest subgroup meeting: log hazard-ratio estimate \(\log(\hat\beta) \geq log(0.90)\) with “consistency rate” at least \(90\)%.

Code
# sg_focus="MinSG" selects smallest subgroup meeting selection criteria

fs <- forestsearch(df.analysis=df_nonAP, df.test=df_AP, confounders.name=confounders.name,
outcome.name="y.sim",treat.name="treat.sim", event.name="event.sim",
potentialOutcome.name="loghr.po",
hr.threshold = 0.90, hr.consistency = 0.80, pconsistency.threshold = 0.90,
sg_focus = "minSG",
showten_subgroups=TRUE, details=TRUE,
conf_force=c("age <= 65","biomarker < 1","biomarker < 2","biomarker < 3",
"biomarker < 4","biomarker < 5","biomarker < 6","biomarker < 7","biomarker < 8",
"biomarker < 9","biomarker < 10"), exclude_cuts=c("biomarker <="),
maxk=1, plot.sg=TRUE)

if(output) save(fs,file="output/sim3_k1tenmin.Rdata")
Code
if(loadit) load(file="output/sim3_k1tenmin.Rdata")

# Training summaries
SG_est <- sg_tables(fs=fs,which_df="est",est_caption="Non-AP (training) estimates",potentialOutcome.name="loghr.po")

id_harm <- paste(fs$sg.harm,collapse=" & ")

Top 10 (up to 10) subgroups meeting criteria:

Code
# Top 10 (up to 10) SG's
SG_est$sg10_out
Subgroups formed by single-factors
maxk=1
M.1 N E hr Pcons
{biomarker < 2} 95 94 1.655311 0.998000
{biomarker < 3} 135 134 1.584584 1.000000
{biomarker < 4} 148 147 1.643559 1.000000
{biomarker < 5} 149 148 1.658330 1.000000
{biomarker < 6} 219 218 1.570647 1.000000
{biomarker < 7} 235 233 1.563982 1.000000
{biomarker < 8} 238 236 1.537168 1.000000
{biomarker < 10} 241 239 1.456963 1.000000

Estimates within training (un-adjusted). Note that “AHR(po)” denotes (we refer to the log hazard ratio contrast as “potential-outcome”) denotes equation (8) applied to the respective subgroups in the table (e.g., \({\cal G}\)=ITT).

Code
SG_est$tab_estimates
Non-AP (training) estimates
Subgroup n n1 events m1 m0 RMST HR (95% CI) AHR(po)
ITT 382 (100%) 189 (49%) 343 (90%) 11.8 10.7 4.72 (1.58,7.86) 0.68 (0.55,0.84) 0.74
Questionable 95 (25%) 46 (48%) 94 (99%) 6.2 8.2 -3.77 (-7.14,-0.39) 1.66 (1.09,2.52) 2.52
Recommend 287 (75%) 143 (50%) 249 (87%) 16 11.8 7.28 (3.68,10.87) 0.55 (0.42,0.71) 0.49

The selected subgroup is {biomarker < 2}

Next plot the K-M curves within the estimated non-AP subgroups.

Code
plot_found.subgroups(fs=fs, df=fs$df.est, title_itt=c("Non-AP ITT"))

Next, display the estimates within the AP population (testing data)

Code
# Training summaries
SG_test <- sg_tables(fs=fs,which_df="testing",est_caption="AP (testing) estimates",potentialOutcome.name="loghr.po")
Code
SG_test$tab_estimates
AP (testing) estimates
Subgroup n n1 events m1 m0 RMST HR (95% CI) AHR(po)
ITT 125 (100%) 63 (50%) 59 (47%) 49.3 NA 0.37 (-5.42,6.16) 1.08 (0.65,1.81) 0.73
Questionable 25 (20%) 15 (60%) 15 (60%) 26.3 NA -7.19 (-17.55,3.16) 3.43 (0.96,12.28) 2.52
Recommend 100 (80%) 48 (48%) 44 (44%) NA NA 3.02 (-2.54,8.59) 0.77 (0.42,1.39) 0.54

K-M curves applied to AP subgroups.

Code
plot_found.subgroups(fs=fs, df=fs$df.test, title_itt=c("AP ITT"))

Now, if we are only concerned with finding the selected subgroup then the above calculations can be much faster by removing the option to output the first 10 subgroups meeting the selection criteria (“showten_subgroups=FALSE” is the default). This is the default and strongly recommended if running simulations or if there is interest in calculating adjusted hazard-ratio estimates for the selected subgroup. However it is crucial to note that in our applications we are using the non-AP data for subgroup identification and interested in the estimates based on the AP local data. Therefore the non-AP data represents the training dataset and the AP data represents the testing dataset where such adjustment is not necessary (that is, the AP data is not utilized for the subgroup selection).

Here we turn-off showing the first ten subgroups (default in forestsearch) and apply the estimated subgroups to the AP data.

Code
# df.test = df_AP indicates that the selected subgroup identification flags 
# (based on df_nonAP analysis) will be appended to df_AP

# Speed is increased by first sorting by subgroup size and once a subgroup 
# meets the consistency criterion the overall criteria is met and the algorithm stops

fs <- forestsearch(df.analysis=df_nonAP, df.test=df_AP, 
confounders.name=confounders.name,
outcome.name="y.sim", treat.name="treat.sim", event.name="event.sim",
potentialOutcome.name="loghr.po",
hr.threshold = 0.90, hr.consistency = 0.80, pconsistency.threshold = 0.90,
sg_focus="minSG", details=TRUE,
conf_force=c("age <= 65","biomarker < 1","biomarker < 2","biomarker < 3",
"biomarker < 4","biomarker < 5","biomarker < 6","biomarker < 7","biomarker < 8",
"biomarker < 9","biomarker < 10"), exclude_cuts=c("biomarker <="), maxk=1)
# of continuous/categorical characteristics 2 7 
Continuous characteristics: age biomarker 
Categorical characteristics: male EU_region AP_region histology surgery ecog1 mAD_CTregimen1 
Default cuts included (1st 20) age <= mean(age) age <= median(age) age <= qlow(age) age <= qhigh(age) biomarker <= mean(biomarker) biomarker <= median(biomarker) biomarker <= qlow(biomarker) biomarker <= qhigh(biomarker) 
Categorical: male EU_region AP_region histology surgery ecog1 mAD_CTregimen1 
Dropping variables (only 1 level): AP_region 
# of candidate subgroup factors= 21 
 [1] "age <= 65"      "biomarker < 1"  "biomarker < 2"  "biomarker < 3" 
 [5] "biomarker < 4"  "biomarker < 5"  "biomarker < 6"  "biomarker < 7" 
 [9] "biomarker < 8"  "biomarker < 9"  "biomarker < 10" "age <= 59.6"   
[13] "age <= 61"      "age <= 52"      "age <= 68"      "male"          
[17] "EU_region"      "histology"      "surgery"        "ecog1"         
[21] "mAD_CTregimen1"
Number of factors evaluated= 21 
Confounders per grf screening q9 q8 q11 q10 q7 q21 q17 q18 q20 q13 q12 q14 q1 q6 q3 q15 q4 q19 q16 q5 q2 
          Factors Labels VI(grf)
9   biomarker < 8     q9  0.2347
8   biomarker < 7     q8  0.1637
11 biomarker < 10    q11  0.1044
10  biomarker < 9    q10  0.1021
7   biomarker < 6     q7  0.0807
21 mAD_CTregimen1    q21  0.0408
17      EU_region    q17  0.0370
18      histology    q18  0.0329
20          ecog1    q20  0.0316
13      age <= 61    q13  0.0261
12    age <= 59.6    q12  0.0251
14      age <= 52    q14  0.0195
1       age <= 65     q1  0.0182
6   biomarker < 5     q6  0.0152
3   biomarker < 2     q3  0.0149
15      age <= 68    q15  0.0127
4   biomarker < 3     q4  0.0110
19        surgery    q19  0.0109
16           male    q16  0.0094
5   biomarker < 4     q5  0.0089
2   biomarker < 1     q2  0.0000
Number of possible configurations (<= maxk): maxk, # <= maxk 1 42 
Approximately 5% of max_count met: minutes 8.333333e-05 
Approximately 10% of max_count met: minutes 0.0001166667 
Approximately 20% of max_count met: minutes 0.0001833333 
Approximately 33% of max_count met: minutes 0.0002666667 
Approximately 50% of max_count met: minutes 0.00035 
Approximately 75% of max_count met: minutes 0.0005166667 
Approximately 90% of max_count met: minutes 6e-04 
# of subgroups evaluated based on (up to) maxk-factor combinations 42 
% of all-possible combinations (<= maxk) 100 
k.max= 1 
Events criteria for control,exp= 10 10 
# of subgroups with events less than criteria: control, experimental 1 1 
# of subgroups with sample size less than criteria 1 
# of subgroups meeting all criteria = 40 
# of subgroups fitted (Cox model estimable) = 40 
*Subgroup Searching Minutes=* 0.00065 
Number of subgroups meeting HR threshold 10 
Subgroup candidate(s) found (FS) 
# of candidate subgroups (meeting HR criteria) =  10 
SGs (1st 10) meeting screening thresholds sorted by sg_focus= minSG 
Kmet (found), Subgroup, % Consistency= 0 !{age <= 68} 0.75 
Consistency met! 
# of splits, Kmet= 1000 1 
Subgroup, % Consistency Met= {biomarker < 2} 0.998 
SG focus= minSG 
Subgroup Consistency Minutes= 0.042 
Subgroup found (FS) 
Minutes forestsearch overall= 0.04796667 
Code
if(output) save(fs,file="output/sim3_k1min.Rdata")

Apply estimated subgroup to the AP data —

Code
if(loadit) load(file="output/sim3_k1min.Rdata")

#sg_tables(fs=fs,which_df="est",est_caption="Non-AP (training) estimates",potentialOutcome.name="loghr.po")$tab_estimates

# Testing summaries
SG_test <- sg_tables(fs=fs,which_df="testing",est_caption="AP (testing) estimates",potentialOutcome.name="loghr.po")
Code
SG_test$tab_estimates
AP (testing) estimates
Subgroup n n1 events m1 m0 RMST HR (95% CI) AHR(po)
ITT 125 (100%) 63 (50%) 59 (47%) 49.3 NA 0.37 (-5.42,6.16) 1.08 (0.65,1.81) 0.73
Questionable 25 (20%) 15 (60%) 15 (60%) 26.3 NA -7.19 (-17.55,3.16) 3.43 (0.96,12.28) 2.52
Recommend 100 (80%) 48 (48%) 44 (44%) NA NA 3.02 (-2.54,8.59) 0.77 (0.42,1.39) 0.54

K-M curves applied to AP subgroups (repeat of above).

Code
plot_found.subgroups(fs=fs, df=fs$df.test, title_itt=c("AP ITT"))

5.3 Finding the largest non-AP subgroup with strong benefit

Here we are directly targeting the largest subgroup with a strong benefit. This is done by simply reversing the role of treatment so that we seek the largest subgroup for which the hazard ratio is below a clinical threshold (e.g., \(\leq 0.6\)).

Code
# sg_focus="MaxSG" selects smallest subgroup meeting selection criteria
# Reverse role of treatment so that we find "worst control" subgroup
dfnew <- copy(df_nonAP)
dfnew$control.sim <- 1-dfnew$treat.sim

dftest <- copy(df_AP)
dftest$control.sim <- 1-dftest$treat.sim

fs_k1_10_maxSG <- forestsearch(df.analysis=dfnew, df.test=dftest, confounders.name=confounders.name,
outcome.name="y.sim",treat.name="control.sim", event.name="event.sim",
potentialOutcome.name="loghr.po",
est.scale="1/hr", hr.threshold = 1/0.6, hr.consistency = 1/0.7, pconsistency.threshold = 0.90,
sg_focus = "maxSG",
showten_subgroups=TRUE, details=TRUE,
conf_force=c("age <= 65","biomarker < 1","biomarker < 2","biomarker < 3",
"biomarker < 4","biomarker < 5","biomarker < 6","biomarker < 7","biomarker < 8",
"biomarker < 9","biomarker < 10"), exclude_cuts=c("biomarker <="),
maxk=1, plot.sg=TRUE)

if(output) save(fs_k1_10_maxSG,file="output/sim3_k1tenmax.Rdata")
Code
if(loadit) load(file="output/sim3_k1tenmax.Rdata")
fs <- fs_k1_10_maxSG
# Testing summaries
SG_est <- sg_tables(fs=fs,which_df="est",est_caption="Non-AP (training) estimates",potentialOutcome.name="loghr.po")

Top 10 (up to 10) subgroups meeting criteria:

Code
# Top 10 (up to 10) SG's
SG_est$sg10_out
Subgroups formed by single-factors
maxk=1
M.1 N E hr Pcons
!{biomarker < 2} 287 249 0.548155 0.951000
!{biomarker < 3} 247 209 0.467949 0.998000
!{biomarker < 4} 234 196 0.437172 0.999000
!{biomarker < 5} 233 195 0.430730 0.998000
!{biomarker < 6} 163 125 0.303895 1.000000
!{biomarker < 7} 147 110 0.237411 1.000000
!{biomarker < 8} 144 107 0.219942 1.000000
!{biomarker < 10} 141 104 0.210599 1.000000

Estimates within training (un-adjusted)

Code
SG_est$tab_estimates
Non-AP (training) estimates
Subgroup n n1 events m1 m0 RMST HR (95% CI) AHR(po)
ITT 382 (100%) 189 (49%) 343 (90%) 11.8 10.7 4.72 (1.58,7.86) 0.68 (0.55,0.84) 0.74
Questionable 95 (25%) 46 (48%) 94 (99%) 6.2 8.2 -3.77 (-7.14,-0.39) 1.66 (1.09,2.52) 2.52
Recommend 287 (75%) 143 (50%) 249 (87%) 16 11.8 7.28 (3.68,10.87) 0.55 (0.42,0.71) 0.49

The selected subgroup is {biomarker < 2}

Next plot the K-M curves within the estimated non-AP subgroups.

Code
plot_found.subgroups(fs=fs, df=fs$df.est, title_itt=c("Non-AP ITT"))

Next, display the estimates within the AP population (testing data)

Code
# Training summaries
SG_test <- sg_tables(fs=fs,which_df="testing",est_caption="AP (testing) estimates",potentialOutcome.name="loghr.po")
Code
SG_test$tab_estimates
AP (testing) estimates
Subgroup n n1 events m1 m0 RMST HR (95% CI) AHR(po)
ITT 125 (100%) 63 (50%) 59 (47%) 49.3 NA 0.37 (-5.42,6.16) 1.08 (0.65,1.81) 0.73
Questionable 25 (20%) 15 (60%) 15 (60%) 26.3 NA -7.19 (-17.55,3.16) 3.43 (0.96,12.28) 2.52
Recommend 100 (80%) 48 (48%) 44 (44%) NA NA 3.02 (-2.54,8.59) 0.77 (0.42,1.39) 0.54

K-M curves applied to AP subgroups.

Code
plot_found.subgroups(fs=fs, df=fs$df.test, title_itt=c("AP ITT"))

5.4 What happens under the (“null”) where treatment is uniform?

Code
# Null uniform benefit hr=0.7
log.hrs <- log(c(0.7,0.7,0.7))

dgm <- get_dgm_stratified(df=df.case,zeta=10,knot=5,log.hrs=log.hrs,strata_tte=NULL,details=TRUE)

Code
df_null <- draw_sim_stratified(dgm=dgm,ss=1,wname="AP_region",bw=-log(5),strata_rand="stratum",checking=FALSE,details=FALSE,return_df=TRUE)

Note that under the null of uniform treatment the challenges of meeting consistency for the AP region are not prevalent in the first place —

Nevertheless, let’s see what happens when we apply the same subgroup identification criteria (to non-AP) as above.

Code
kmH.fit<-KM.plot.2sample.weighted(df=subset(df_null,AP_region==1), 
tte.name="y.sim", event.name="event.sim", treat.name="treat.sim",
risk.set=TRUE, by.risk=6, details=FALSE,
show.logrank=FALSE, put.legend.arms="top",
Xlab="Months",Ylab="Overall survival",
arms=c("Experimental","Control"))
title(main="AP Population")

Code
fs <- forestsearch(df.analysis=subset(df_null,AP_region==0),  
confounders.name=confounders.name,
outcome.name="y.sim", treat.name="treat.sim", event.name="event.sim",
potentialOutcome.name="loghr.po",
hr.threshold = 0.90, hr.consistency = 0.80, pconsistency.threshold = 0.90,
sg_focus="minSG", details=TRUE,
conf_force=c("age <= 65","biomarker < 1","biomarker < 2","biomarker < 3",
"biomarker < 4","biomarker < 5","biomarker < 6","biomarker < 7","biomarker < 8",
"biomarker < 9","biomarker < 10"), exclude_cuts=c("biomarker <="), maxk=1)
# of continuous/categorical characteristics 2 7 
Continuous characteristics: age biomarker 
Categorical characteristics: male EU_region AP_region histology surgery ecog1 mAD_CTregimen1 
Default cuts included (1st 20) age <= mean(age) age <= median(age) age <= qlow(age) age <= qhigh(age) biomarker <= mean(biomarker) biomarker <= median(biomarker) biomarker <= qlow(biomarker) biomarker <= qhigh(biomarker) 
Categorical: male EU_region AP_region histology surgery ecog1 mAD_CTregimen1 
Dropping variables (only 1 level): AP_region 
# of candidate subgroup factors= 21 
 [1] "age <= 65"      "biomarker < 1"  "biomarker < 2"  "biomarker < 3" 
 [5] "biomarker < 4"  "biomarker < 5"  "biomarker < 6"  "biomarker < 7" 
 [9] "biomarker < 8"  "biomarker < 9"  "biomarker < 10" "age <= 59.6"   
[13] "age <= 61"      "age <= 52"      "age <= 68"      "male"          
[17] "EU_region"      "histology"      "surgery"        "ecog1"         
[21] "mAD_CTregimen1"
Number of factors evaluated= 21 
Confounders per grf screening q16 q20 q14 q21 q17 q19 q18 q3 q13 q1 q15 q5 q12 q7 q4 q6 q8 q9 q11 q10 q2 
          Factors Labels VI(grf)
16           male    q16  0.1593
20          ecog1    q20  0.1023
14      age <= 52    q14  0.0986
21 mAD_CTregimen1    q21  0.0798
17      EU_region    q17  0.0645
19        surgery    q19  0.0604
18      histology    q18  0.0588
3   biomarker < 2     q3  0.0542
13      age <= 61    q13  0.0529
1       age <= 65     q1  0.0491
15      age <= 68    q15  0.0329
5   biomarker < 4     q5  0.0328
12    age <= 59.6    q12  0.0299
7   biomarker < 6     q7  0.0295
4   biomarker < 3     q4  0.0292
6   biomarker < 5     q6  0.0247
8   biomarker < 7     q8  0.0131
9   biomarker < 8     q9  0.0102
11 biomarker < 10    q11  0.0092
10  biomarker < 9    q10  0.0086
2   biomarker < 1     q2  0.0000
Number of possible configurations (<= maxk): maxk, # <= maxk 1 42 
Approximately 5% of max_count met: minutes 3.333333e-05 
Approximately 10% of max_count met: minutes 6.666667e-05 
Approximately 20% of max_count met: minutes 0.0001166667 
Approximately 33% of max_count met: minutes 0.0001833333 
Approximately 50% of max_count met: minutes 0.0002666667 
Approximately 75% of max_count met: minutes 0.0004166667 
Approximately 90% of max_count met: minutes 0.0004833333 
# of subgroups evaluated based on (up to) maxk-factor combinations 42 
% of all-possible combinations (<= maxk) 100 
k.max= 1 
Events criteria for control,exp= 10 10 
# of subgroups with events less than criteria: control, experimental 1 1 
# of subgroups with sample size less than criteria 1 
# of subgroups meeting all criteria = 40 
# of subgroups fitted (Cox model estimable) = 40 
*Subgroup Searching Minutes=* 0.0005333333 
Number of subgroups meeting HR threshold 1 
Subgroup candidate(s) found (FS) 
# of candidate subgroups (meeting HR criteria) =  1 
SGs (1st 10) meeting screening thresholds sorted by sg_focus= minSG 
Kmet (found), Subgroup, % Consistency= 0 {surgery} 0.547 
Subgroup Consistency Minutes= 0.02083333 
NO subgroup found (FS) 
Minutes forestsearch overall= 0.02335 
Code
if(output) save(fs, file="output/sim3_k1min_null.Rdata")
Code
if(loadit) load("output/sim3_k1min_null.Rdata")
if(is.null(fs$sg.harm)) cat("Subgroup NOT found","\n")
Subgroup NOT found 

We have been focusing on identifying subgroups based only on single factors, such as biomarker cuts, or other factors. However, in theory, subgroups effects could involve more than a single factor (In our forestsearch algorithm this is set by maxk=1).

For illustration we now consider selecting the smallest subgroup among 2-factor combinations (maxk=2).

Code
# sg_focus="MinSG" selects smallest subgroup meeting selection criteria

fs <- forestsearch(df.analysis=df_nonAP, confounders.name=confounders.name,
outcome.name="y.sim",treat.name="treat.sim", event.name="event.sim",
potentialOutcome.name="loghr.po",
hr.threshold = 0.90, hr.consistency = 0.80, pconsistency.threshold = 0.90,
sg_focus = "minSG",
showten_subgroups=TRUE, details=TRUE,
conf_force=c("age <= 65","biomarker < 1","biomarker < 2","biomarker < 3",
"biomarker < 4","biomarker < 5","biomarker < 6","biomarker < 7","biomarker < 8",
"biomarker < 9","biomarker < 10"), exclude_cuts=c("biomarker <="),
maxk=2, plot.sg=TRUE)

if(output) save(fs,file="output/sim3_k2tenmin.Rdata")
Code
if(loadit) load(file="output/sim3_k2tenmin.Rdata")
# Testing summaries
SG_est <- sg_tables(fs=fs,which_df="est",est_caption="Non-AP (training) estimates",potentialOutcome.name="loghr.po")

Top 10 (up to 10) subgroups meeting criteria:

Code
# Top 10 (up to 10) SG's
SG_est$sg10_out
Subgroups formed by two-factors
maxk=2
M.1 M.2 N E hr Pcons
{biomarker < 6} {age <= 52} 61 61 1.968404 0.996000
{age <= 59.6} {biomarker < 4} 63 63 1.463912 0.978000
{biomarker < 7} {age <= 52} 64 64 1.972295 0.995000
{age <= 59.6} {biomarker < 5} 64 64 1.500235 0.983000
{EU_region} {biomarker < 5} 65 64 1.490210 0.985000
{biomarker < 8} {age <= 52} 66 66 1.969458 0.997000
{age <= 65} {biomarker < 2} 66 65 1.901186 0.997000
{mAD_CTregimen1} {biomarker < 3} 66 66 1.227319 0.915000
{age <= 61} {biomarker < 3} 67 66 1.855252 0.997000
{biomarker < 10} {age <= 52} 67 67 1.709685 0.996000

6 Simulations

Next we setup simulations to evaluate ability to identify subgroups based on non-AP data and the accuracy to estimate treatment effects in the AP data.

The simulation function mrct_APregion_sims simulates (n_sims times) from specified data-generating-model (dgm):

  • ITT sample is drawn with subgroups identified (as above)
  • Cox estimates based on the AP local data;
  • Subgroups identified based on non-AP and applied to AP
  • For identified subgroups the Cox model estimates are created (hr_sg)
    • Note that if a subgroup is NOT found then “subgroups” are taken as the entire AP population
    • That is, if a subgroup is NOT identified then we interpret the hr_sg as the reported AP estimates which would be the overall AP (since no subgroup is found)
    • In tables below we also summarize hr_sg_null where we define hr_sg to be missing if no subgroup is found
      • This represents the AP subgroup estimates conditional on a subgroup being identified
Code
# Simulation function to automate
mrct_APregion_sims <- function(dgm,n_sims,wname="AP_region",bw=-log(5),sg_focus="minSG",maxk=1, 
hr.threshold=0.90, hr.consistency=0.80, pconsistency.threshold=0.9,details=FALSE){
t.start<-proc.time()[3]
  # message for backends
  # borrowed from simtrials (sim_fixed_n)
  if (!is(plan(), "sequential")) {
    # future backend
    message("Using ", nbrOfWorkers(), " cores with backend ", attr(plan("list")[[1]], "class")[2])
  } else if (foreach::getDoParWorkers() > 1) {
    message("Using ", foreach::getDoParWorkers(), " cores with backend ", foreach::getDoParName())
    message("Warning: ")
    message("doFuture may exhibit suboptimal performance when using a doParallel backend.")
  } else {
    message("Backend uses sequential processing.")
  }
results <- foreach(
    sim = seq_len(n_sims),
    .options.future=list(seed=TRUE),
    .combine="rbind", 
    .errorhandling="pass"
  ) %dofuture% {
# simulate data
dfs <- draw_sim_stratified(dgm=dgm,ss=sim,wname=wname,bw=bw,strata_rand="stratum")
# Subgroups identified with nonAP population
df_nonAP <- subset(dfs,AP_region==0)
# Applied to AP
df_AP <- subset(dfs,AP_region==1)
# For subgroup identified estimates     
# Return NA if not estimable
# First, AP results (does not require subgroup identification)
# Initialize to missing
n_test <- c(nrow(df_AP))
n_train <- c(nrow(df_nonAP))
n_itt <- c(nrow(dfs))

fit <- summary(coxph(Surv(y.sim,event.sim) ~ treat.sim, data=df_nonAP))$conf.int
hr_train <- c(fit[1])
rm("fit")

# Overall (ITT) standard stratified (by randomization strata)
fit <- summary(coxph(Surv(y.sim,event.sim) ~ treat.sim + strata(strata.simR), data=dfs))$conf.int
hr_itt <- c(fit[1])
rm("fit")

# Overall (ITT) standard stratified (by X)
fit <- summary(coxph(Surv(y.sim,event.sim) ~ treat.sim + strata(AP_region), data=dfs))$conf.int
hr_ittX <- c(fit[1])
rm("fit")


hr_test<- NA
any_found <- 0.0
sg_found <- "none"
n_sg <- n_test
hr_sg <- NA
prev_sg <- 1.0

# Restrict to AP estimable
fitAP <- try(summary(coxph(Surv(y.sim,event.sim) ~ treat.sim, data=df_AP))$conf.int,TRUE)
if(!inherits(fitAP,"try-error")){
hr_test <- c(fitAP[1])
rm("fitAP")
# For identified subgroups?
# initialize to testing
any_found <- 0.0 
sg_found <- "none" 
n_sg <- n_test 
prev_sg <- 1.0 # Prevalence of AP: n_sg/n_all
hr_sg <- NA
# Potential outcome
POhr_sg <- exp(mean(df_AP$loghr.po))
# Note: if not found then n_sg=n_all, hr_sg=hr_all 
fs_temp <- try(forestsearch(df.analysis=df_nonAP, df.test=df_AP, 
confounders.name=confounders.name,
outcome.name="y.sim", treat.name="treat.sim", event.name="event.sim",
potentialOutcome.name="loghr.po",
hr.threshold = hr.threshold, hr.consistency = hr.consistency, 
pconsistency.threshold = pconsistency.threshold,
sg_focus=sg_focus, 
conf_force=c("age <= 65","biomarker < 1","biomarker < 2","biomarker < 3",
"biomarker < 4","biomarker < 5","biomarker < 6","biomarker < 7","biomarker < 8",
"biomarker < 9","biomarker < 10"), exclude_cuts =c("biomarker <="), maxk=maxk),TRUE)
# FS with error output NA's (set above)
# FS without error
if(!inherits(fs_temp,"try-error")){
# FS done (w/o error) but NO sg found (replace NAs above with what is calculated)
if(is.null(fs_temp$sg.harm)){
# consider "sg = AP"  
any_found <- 0.0
sg_found <- "none"
n_sg <- n_test
hr_sg <- hr_test
POhr_sg <- exp(mean(df_AP$loghr.po))
prev_sg <- 1.0
}
# If sg found
if(!is.null(fs_temp$sg.harm)){
any_found <- 1.0  
df_test <- fs_temp$df.test 
sg_found <- c(paste(fs_temp$sg.harm,collapse=" & "))
if(details & sim <= 10) cat("Simulation subgroup found:",c(sim,sg_found),"\n")  
df_sg <- subset(df_test,treat.recommend==1)
n_sg <- c(nrow(df_sg))
prev_sg <- c(n_sg/n_test)
# Subgroup analysis
fitSG <- try(summary(coxph(Surv(y.sim,event.sim)~treat.sim,data=df_sg))$conf.int,TRUE)
if(!inherits(fitSG,"try_error") & is.numeric(fitSG[1])) hr_sg <- c(fitSG[1])
if(inherits(fitSG,"try_error") | !is.numeric(fitSG[1])) hr_sg <- NA
loghr.po <- df_sg$loghr.po
POhr_sg <- exp(mean(loghr.po))
rm("fitSG")
}
 }  # FS without error done
rm("fs_temp")
} # AP estimable in first place
# Results
dfres <- data.table::data.table(n_itt,hr_itt,hr_ittX,n_train,hr_train,n_test,hr_test,any_found,sg_found,n_sg,hr_sg,POhr_sg,prev_sg)
return(dfres)
  }
  t.now<-proc.time()[3]
  t.min<-(t.now-t.start)/60
  if(details){
    cat("Minutes for simulations",c(n_sims,t.min),"\n")
    cat("Projection per 1000",c(t.min*(1000/n_sims)),"\n")
    cat("Propn subgroups found =",c(sum(!is.na(results$any_found))/n_sims),"\n")
  }
# Define hr_sg_null as missing if any_found = 0
results$hr_sg_null <- results$hr_sg
results$hr_sg_null[results$any_found==0] <- NA
return(results)  
}

summaryout_mrct <- function(pop_summary,mrct_sims, 
out_sgs = c("sg_found","sg_biomarker","sg_age"),
out_sgs2 = c("sg_biomarker","sg_age","sg_male","sg_ecog1","sg_histology","sg_CTregimen","sg_EU","sg_surgery"),
out_est = c("+ reldelta + APflag + sg_le85 + APflag2 + APflag3 + hr_sg_null"),
sg_type=1,
tab_caption = c("Identified subgroups and estimation summaries under biomarker effects: 1-factor combinations"), showtable=TRUE){

outwhat1 <- c("~ hr_itt + hr_ittX + hr_test + found + prev_sg + hr_sg + POhr_sg +")
if(sg_type==1) outwhat2 <- paste(out_sgs, collapse="+")
if(sg_type==2) outwhat2 <- paste(out_sgs2, collapse="+")
out_what <- as.formula(paste(outwhat1,outwhat2,out_est))

res <- list()
# True causal AHR
res$ahr_true = c(round(pop_summary$AHR,4))
# AHR Cox un-adjusted (only treatment)
res$ahr_unadj = c(round(pop_summary$ITT_unadj,4))
# Treatment plus randomization stratificaton sR
res$ahr_sR = c(round(pop_summary$ITT_sR,4))
# adjustment by w
res$ahr_sRw = c(round(pop_summary$ITT_sRw,4))
# Relative biases 
res$bias_unadj <-round(100*c(pop_summary$ITT_unadj-pop_summary$AHR)/pop_summary$AHR,1)
res$bias_sR <-round(100*c(pop_summary$ITT_sR-pop_summary$AHR)/pop_summary$AHR,1)
res$bias_sRw <-round(100*c(pop_summary$ITT_sRw-pop_summary$AHR)/pop_summary$AHR,1)
# The bias within W=1 population
res$ahr_w1 = c(round(pop_summary$W_1,4))
res$bias_w1 <-round(100*c(pop_summary$W_1-pop_summary$AHR)/pop_summary$AHR,1)

mrct_sims$APflag <- ifelse(mrct_sims$hr_test > 0.9,"AP > 0.9","AP <=0.9")
mrct_sims$sg_le85 <- ifelse(mrct_sims$hr_sg <=0.85,"AP(sg)<=0.85","AP(sg)>0.85")

mrct_sims$APflag2 <- ifelse(mrct_sims$hr_test > 0.9 & mrct_sims$hr_sg <= 0.85,"AP > 0.9 & AP(sg) <= 0.85","Not")
mrct_sims$APflag3 <- ifelse(mrct_sims$hr_test > 0.9 & mrct_sims$hr_sg <= 0.80,"AP > 0.9 & AP(sg) <= 0.80","Not")

mrct_sims$reldelta <- 100*c(mrct_sims$hr_test - mrct_sims$hr_sg)/mrct_sims$hr_test

mrct_sims$found <- as.factor(mrct_sims$any_found)

# Classify by specific factors identified
mrct_sims$sg_biomarker <- ifelse(grepl("biomarker",mrct_sims$sg_found),
"biomarker",ifelse(mrct_sims$sg_found!="none","other","none"))

mrct_sims$sg_age <- ifelse(grepl("age",mrct_sims$sg_found),
"age",ifelse(mrct_sims$sg_found!="none","other","none"))

mrct_sims$sg_ecog1 <- ifelse(grepl("ecog1",mrct_sims$sg_found),
"ecog1",ifelse(mrct_sims$sg_found!="none","other","none"))

mrct_sims$sg_histology <- ifelse(grepl("histology",mrct_sims$sg_found),
"histology",ifelse(mrct_sims$sg_found!="none","other","none"))

mrct_sims$sg_CTregimen <- ifelse(grepl("mAD_CTregimen1",mrct_sims$sg_found),
"mAD_CT",ifelse(mrct_sims$sg_found!="none","other","none"))

mrct_sims$sg_male <- ifelse(grepl("male",mrct_sims$sg_found),
"male",ifelse(mrct_sims$sg_found!="none","other","none"))

mrct_sims$sg_surgery <- ifelse(grepl("surgery",mrct_sims$sg_found),
"surgery",ifelse(mrct_sims$sg_found!="none","other","none"))

mrct_sims$sg_EU <- ifelse(grepl("EU_region",mrct_sims$sg_found),
"EU region",ifelse(mrct_sims$sg_found!="none","other","none"))

out_table <- table1(out_what, data=mrct_sims, caption=tab_caption)

if(showtable) print(out_table)

return(list(res=res,out_table=out_table))
}

SGplot_estimates <- function(df,label_training="Training",label_testing="Testing", label_itt="ITT (sR)", label_sg="Testing(subgroup)"){
df_itt <- data.table(est=df$hr_itt, analysis=label_itt)
df_training <- data.table(est=df$hr_train, analysis=label_training)
df_testing <- data.table(est=df$hr_test, analysis=label_testing)
df_sg <- data.table(est=df$hr_sg, analysis=label_sg)

hr_estimates <- rbind(df_itt, df_training, df_testing, df_sg)  
est_order <- c(label_itt,label_training,label_testing,label_sg)
hr_estimates <- within(hr_estimates, {analysis <- factor(analysis,levels=est_order)})
p <- ggplot(hr_estimates, aes(x=analysis, y=est, fill=analysis)) + 
  geom_violin(trim=FALSE)
plot_estimates <- p + geom_boxplot(width=0.1)
return(list(dfPlot_estimates=hr_estimates, plot_estimates=plot_estimates))
} 

6.1 Strong biomarker effect (prognostic effect \(\beta_{w}=-\log(5)\))

Simulations under strong biomarker effect

Code
# Run sims and store results
t.start<-proc.time()[3]
# DGM 
#log.hrs <- log(c(2.5,1.25,0.50))
# example 2
log.hrs <- log(c(3,1.25,0.50))

dgm <- get_dgm_stratified(df=df.case,zeta=10,knot=5,log.hrs=log.hrs,strata_tte=NULL,details=TRUE)

pop_summary <- draw_sim_stratified(dgm=dgm,ss=1,Ndraw=100000,wname="AP_region",
                                   bw=-log(5),strata_rand="stratum",checking=FALSE,details=FALSE,return_df=FALSE)

# True causal AHR
ahr_true = c(round(pop_summary$AHR,4))
# AHR Cox un-adjusted (only treatment)
ahr_unadj = c(round(pop_summary$ITT_unadj,4))
# Treatment plus randomization stratificaton sR
ahr_sR = c(round(pop_summary$ITT_sR,4))
# sR including adjustment by x
ahr_sRw = c(round(pop_summary$ITT_sRw,4))
# Relative biases 
bias_unadj <-round(100*c(pop_summary$ITT_unadj-pop_summary$AHR)/pop_summary$AHR,1)
bias_sR <-round(100*c(pop_summary$ITT_sR-pop_summary$AHR)/pop_summary$AHR,1)
bias_sRw <-round(100*c(pop_summary$ITT_sRw-pop_summary$AHR)/pop_summary$AHR,1)
# The bias within W=1 population
ahr_w1 = c(round(pop_summary$W_1,4))
bias_w1 <-round(100*c(pop_summary$W_1-pop_summary$AHR)/pop_summary$AHR,1)

cat("AHR true (causal) and within AP:",c(ahr_true,ahr_w1),"\n")

n_sims <- nsims

library(doFuture)
library(doRNG)
registerDoFuture()
registerDoRNG()

mrct_sims <- mrct_APregion_sims(dgm=dgm,n_sims=n_sims,wname="AP_region",
                                bw=-log(5),sg_focus="minSG",details=TRUE)

t.now<-proc.time()[3]
t.min<-(t.now-t.start)/60
cat("Minutes (doFuture) simulations",c(t.min),"\n")

cat("How often was a subgroup identified=",c(mean(mrct_sims$any_found)),"\n")

if(output) save(n_sims,pop_summary,dgm,mrct_sims,file="output/mrct_sims3_v0.Rdata")
Code
if(loadit) load(file="output/mrct_sims3_v0.Rdata")

# Look at large-sample spline in non-AP
dflarge <- draw_sim_stratified(dgm=dgm,ss=1,Ndraw=10000,wname="AP_region",bw=-log(5),
                               strata_rand="stratum",checking=FALSE,details=FALSE,return_df=TRUE)

temp <- cox_cs_fit(df=subset(dflarge,AP_region==0),tte.name="y.sim",event.name="event.sim",treat.name="treat.sim",
strata.name="strata.simO",z.name=c("z"), details=FALSE ,boots=0, xlab=c("Biomarker"),maxz = 25, ydel=0.5, cex_legend=0.5, show_plot=TRUE,truebeta.name="loghr.po",
main_title=c("Non-AP large sample ~ training estimation properties"))

Code
true_pos <- round(100*c(mean(mrct_sims$any_found)),4)

temp <- summaryout_mrct(pop_summary=pop_summary, mrct_sims=mrct_sims,showtable=FALSE)

ahr_true <- c(temp$res$ahr_true)
ahr_unadj <- c(temp$res$ahr_unadj)
ahr_sR <- c(temp$res$ahr_sR)
ahr_sRw <- c(temp$res$ahr_sRw)
ahr_w1 <- c(temp$res$ahr_w1)
  • The underlying true ITT causal hazard ratio is 0.7368
    • The large-sample (single draw) approximation to ITT un-adjusted is 0.7904
    • The large-sample ~ to ITT stratified is 0.7641
    • The large-sample ~ to ITT stratified + x-adjustment is 0.7458
    • The large-sample ~ to AP region analysis is 1.016

The true-positive rate (identifying a subgroup under HTE (heterogeneous treatment effects))

  • Across 1,000 simulations (a subgroup was identified) = 100%

Summary of subgroups identified and estimation properties are:

Code
# Show the table
temp$out_table
Identified subgroups and estimation summaries under biomarker effects: 1-factor combinations
Overall
(N=1000)
hr_itt
Mean (SD) 0.778 (0.0683)
Median [Min, Max] 0.775 [0.564, 1.03]
hr_ittX
Mean (SD) 0.764 (0.0662)
Median [Min, Max] 0.763 [0.555, 1.01]
hr_test
Mean (SD) 1.05 (0.266)
Median [Min, Max] 1.03 [0.431, 2.50]
found
1 1000 (100%)
prev_sg
Mean (SD) 0.800 (0.00613)
Median [Min, Max] 0.800 [0.688, 0.800]
hr_sg
Mean (SD) 0.825 (0.244)
Median [Min, Max] 0.802 [0.288, 2.13]
POhr_sg
Mean (SD) 0.536 (0.00878)
Median [Min, Max] 0.536 [0.536, 0.696]
sg_found
!{age <= 68} 3 (0.3%)
{biomarker < 2} 997 (99.7%)
sg_biomarker
biomarker 997 (99.7%)
other 3 (0.3%)
sg_age
age 3 (0.3%)
other 997 (99.7%)
reldelta
Mean (SD) 21.6 (10.1)
Median [Min, Max] 22.0 [-16.7, 52.7]
APflag
AP <=0.9 313 (31.3%)
AP > 0.9 687 (68.7%)
sg_le85
AP(sg)<=0.85 607 (60.7%)
AP(sg)>0.85 393 (39.3%)
APflag2
AP > 0.9 & AP(sg) <= 0.85 297 (29.7%)
Not 703 (70.3%)
APflag3
AP > 0.9 & AP(sg) <= 0.80 192 (19.2%)
Not 808 (80.8%)
hr_sg_null
Mean (SD) 0.825 (0.244)
Median [Min, Max] 0.802 [0.288, 2.13]
Code
temp <- SGplot_estimates(df=mrct_sims, label_training="Non-AP, ITT", label_itt="Overall, ITT", label_testing="AP, ITT", label_sg="AP, identified subgroup")
plot(temp$plot_estimates)

6.2 Null model where benefit is uniform (prognostic effect bw=log5)

Simulations under the null where treatment benefit is uniform; Treatment does not depend on any baseline factors (hazard-ratio is 0.70)

Code
# Increase simulations to 5,000 under "null"
t.start<-proc.time()[3]

# Null uniform benefit hr=0.7
log.hrs <- log(c(0.7,0.7,0.7))
dgm <- get_dgm_stratified(df=df.case,zeta=10,knot=5,log.hrs=log.hrs,strata_tte=NULL,details=TRUE)

pop_summary <- draw_sim_stratified(dgm=dgm,ss=1,Ndraw=100000,wname="AP_region",
                                   bw=-log(5),strata_rand="stratum",checking=FALSE,details=FALSE,return_df=FALSE)

# True causal AHR
ahr_true = c(round(pop_summary$AHR,4))
# AHR Cox un-adjusted (only treatment)
ahr_unadj = c(round(pop_summary$ITT_unadj,4))
# Treatment plus randomization stratificaton sR
ahr_sR = c(round(pop_summary$ITT_sR,4))
# sR including adjustment by x
ahr_sRw = c(round(pop_summary$ITT_sRw,4))
# Relative biases 
bias_unadj <-round(100*c(pop_summary$ITT_unadj-pop_summary$AHR)/pop_summary$AHR,1)
bias_sR <-round(100*c(pop_summary$ITT_sR-pop_summary$AHR)/pop_summary$AHR,1)
bias_sRw <-round(100*c(pop_summary$ITT_sRw-pop_summary$AHR)/pop_summary$AHR,1)
# The bias within X=1 population
ahr_w1 = c(round(pop_summary$W_1,4))
bias_w1 <-round(100*c(pop_summary$W_1-pop_summary$AHR)/pop_summary$AHR,1)

cat("AHR true (causal) and within AP:",c(ahr_true,ahr_w1),"\n")

n_sims <- nsims

library(doFuture)
library(doRNG)
registerDoFuture()
registerDoRNG()


mrct_sims <- mrct_APregion_sims(dgm=dgm,n_sims=n_sims,wname="AP_region",
                                bw=-log(5),sg_focus="minSG",details=TRUE)

t.now<-proc.time()[3]
t.min<-(t.now-t.start)/60
cat("Minutes (doFuture) simulations",c(t.min),"\n")


cat("How often was a subgroup identified=",c(mean(mrct_sims$any_found)),"\n")

if(output) save(n_sims,pop_summary,dgm,mrct_sims,file="output/mrct_sims03_v0.Rdata")
Code
if(loadit) load(file="output/mrct_sims03_v0.Rdata")

false_pos <- round(100*c(mean(mrct_sims$any_found)),4)

# Look at large-sample spline in non-AP
dflarge <- draw_sim_stratified(dgm=dgm,ss=1,Ndraw=10000,wname="AP_region",bw=-log(5),
                               strata_rand="stratum",checking=FALSE,details=FALSE,return_df=TRUE)

temp <- cox_cs_fit(df=subset(dflarge,AP_region==0),tte.name="y.sim",event.name="event.sim",treat.name="treat.sim",
strata.name="strata.simO",z.name=c("z"), details=FALSE ,boots=0, xlab=c("Biomarker"),maxz = 25, ydel=0.5, cex_legend=0.5, show_plot=TRUE,truebeta.name="loghr.po",
main_title=c("Non-AP large sample ~ training estimation properties"))

Code
# Note, here too many individual subgroup cuts, set sg_type=2
temp <- summaryout_mrct(pop_summary=pop_summary, mrct_sims=mrct_sims,showtable=FALSE, 
                        sg_type=2, tab_caption=c("Identified subgroups and estimation summaries under uniform treatment effects: 1-factor combinations"))

ahr_true <- c(temp$res$ahr_true)
ahr_unadj <- c(temp$res$ahr_unadj)
ahr_sR <- c(temp$res$ahr_sR)
ahr_sRw <- c(temp$res$ahr_sRw)
ahr_w1 <- c(temp$res$ahr_w1)
  • The underlying true ITT causal hazard ratio is 0.7
    • The large-sample (single draw) approximation to ITT un-adjusted is 0.7619
    • The large-sample ~ to ITT stratified is 0.706
    • The large-sample ~ to ITT stratified + x-adjustment is 0.7056
    • The large-sample ~ to AP region analysis is 0.7173

The false-positive rate (identifying a subgroup under uniform benefit)

  • Across 1,000 simulations (a subgroup was falsely identified) = 9.9%

Summary of subgroups identified and estimation properties are:

Code
# Show the table
temp$out_table
Identified subgroups and estimation summaries under uniform treatment effects: 1-factor combinations
Overall
(N=1000)
hr_itt
Mean (SD) 0.708 (0.0689)
Median [Min, Max] 0.704 [0.491, 0.996]
hr_ittX
Mean (SD) 0.708 (0.0675)
Median [Min, Max] 0.705 [0.493, 1.01]
hr_test
Mean (SD) 0.733 (0.208)
Median [Min, Max] 0.712 [0.271, 1.80]
found
0 901 (90.1%)
1 99 (9.9%)
prev_sg
Mean (SD) 0.965 (0.122)
Median [Min, Max] 1.00 [0, 1.00]
hr_sg
Mean (SD) 0.739 (0.233)
Median [Min, Max] 0.715 [0.261, 3.67]
Missing 2 (0.2%)
POhr_sg
Mean (SD) 0.700 (0)
Median [Min, Max] 0.700 [0.700, 0.700]
Missing 1 (0.1%)
sg_biomarker
biomarker 20 (2.0%)
none 901 (90.1%)
other 79 (7.9%)
sg_age
age 41 (4.1%)
none 901 (90.1%)
other 58 (5.8%)
sg_male
male 10 (1.0%)
none 901 (90.1%)
other 89 (8.9%)
sg_ecog1
ecog1 7 (0.7%)
none 901 (90.1%)
other 92 (9.2%)
sg_histology
histology 3 (0.3%)
none 901 (90.1%)
other 96 (9.6%)
sg_CTregimen
mAD_CT 2 (0.2%)
none 901 (90.1%)
other 97 (9.7%)
sg_EU
EU region 4 (0.4%)
none 901 (90.1%)
other 95 (9.5%)
sg_surgery
none 901 (90.1%)
other 87 (8.7%)
surgery 12 (1.2%)
reldelta
Mean (SD) -0.755 (14.7)
Median [Min, Max] 0 [-396, 55.0]
Missing 2 (0.2%)
APflag
AP <=0.9 796 (79.6%)
AP > 0.9 204 (20.4%)
sg_le85
AP(sg)<=0.85 732 (73.2%)
AP(sg)>0.85 266 (26.6%)
Missing 2 (0.2%)
APflag2
AP > 0.9 & AP(sg) <= 0.85 3 (0.3%)
Not 997 (99.7%)
APflag3
AP > 0.9 & AP(sg) <= 0.80 2 (0.2%)
Not 998 (99.8%)
hr_sg_null
Mean (SD) 0.798 (0.392)
Median [Min, Max] 0.745 [0.261, 3.67]
Missing 903 (90.3%)
Code
temp <- SGplot_estimates(df=mrct_sims, label_training="Non-AP, ITT", label_itt="Overall, ITT", label_testing="AP, ITT", label_sg="AP, identified subgroup")
plot(temp$plot_estimates)

6.3 Null model where benefit is uniform (prognostic effect bw=-log5): maxk=2

Simulations under the null where treatment benefit is uniform (maxk=2); Treatment does not depend on any baseline factors (hazard-ratio is 0.7)

Note: Allowing for 2-factor combinations may require more stringent hazard-ratio thresholds; Here we increase to 1.25 (1.0) as in paper

Code
t.start<-proc.time()[3]

# Null uniform benefit hr=0.7
log.hrs <- log(c(0.7,0.7,0.7))
dgm <- get_dgm_stratified(df=df.case,zeta=10,knot=5,log.hrs=log.hrs,strata_tte=NULL,details=TRUE)

pop_summary <- draw_sim_stratified(dgm=dgm,ss=1,Ndraw=100000,wname="AP_region",
                                   bw=-log(5),strata_rand="stratum",checking=FALSE,details=FALSE,return_df=FALSE)

# True causal AHR
ahr_true = c(round(pop_summary$AHR,4))
# AHR Cox un-adjusted (only treatment)
ahr_unadj = c(round(pop_summary$ITT_unadj,4))
# Treatment plus randomization stratificaton sR
ahr_sR = c(round(pop_summary$ITT_sR,4))
# sR including adjustment by x
ahr_sRw = c(round(pop_summary$ITT_sRw,4))
# Relative biases 
bias_unadj <-round(100*c(pop_summary$ITT_unadj-pop_summary$AHR)/pop_summary$AHR,1)
bias_sR <-round(100*c(pop_summary$ITT_sR-pop_summary$AHR)/pop_summary$AHR,1)
bias_sRw <-round(100*c(pop_summary$ITT_sRw-pop_summary$AHR)/pop_summary$AHR,1)
# The bias within X=1 population
ahr_w1 = c(round(pop_summary$W_1,4))
bias_w1 <-round(100*c(pop_summary$W_1-pop_summary$AHR)/pop_summary$AHR,1)

cat("AHR true (causal) and within AP:",c(ahr_true,ahr_w1),"\n")

n_sims <- nsims

library(doFuture)
library(doRNG)
registerDoFuture()
registerDoRNG()

mrct_sims <- mrct_APregion_sims(dgm=dgm,n_sims=n_sims,wname="AP_region",bw=-log(5),
sg_focus="hr", maxk=2, hr.threshold=1.25, hr.consistency=1.0, details=TRUE)

t.now<-proc.time()[3]
t.min<-(t.now-t.start)/60
cat("Minutes (doFuture) simulations",c(t.min),"\n")


cat("How often was a subgroup identified=",c(mean(mrct_sims$any_found)),"\n")

if(output) save(n_sims,pop_summary,dgm,mrct_sims,file="output/mrct_sims03_k2_v0.Rdata")
Code
if(loadit) load(file="output/mrct_sims03_k2_v0.Rdata")
false_pos <- round(100*c(mean(mrct_sims$any_found)),4)

cat("What do a few non-null look like","\n")
What do a few non-null look like 
Code
head(mrct_sims$sg_found[which(mrct_sims$any_found==1)])
[1] "{EU_region} & {age <= 59.6}"         "!{biomarker < 7} & !{histology}"    
[3] "!{mAD_CTregimen1} & {biomarker < 6}" "!{ecog1} & !{biomarker < 4}"        
[5] "!{age <= 65} & {biomarker < 9}"      "{surgery} & !{biomarker < 5}"       
Code
# Look at large-sample spline in non-AP
# This is same as above so do not repeat
# dflarge <- draw_sim_stratified(dgm=dgm,ss=1,Ndraw=10000,xname="AP_region",bx=log(5),strata_rand="stratum",checking=FALSE,details=FALSE,return_df=TRUE)
# temp <- cox_cs_fit(df=subset(dflarge,AP_region==0),tte.name="y.sim",event.name="event.sim",treat.name="treat.sim",
# strata.name="strata.simO",z.name=c("z"), details=FALSE ,boots=0, xlab=c("Biomarker"),maxz = 25, ydel=0.5, cex_legend=0.5, show_plot=TRUE,truebeta.name="loghr.po",
# main_title=c("Non-AP large sample ~ training estimation properties"))

# Note, here too many individual subgroup cuts, set sg_type=2
temp <- summaryout_mrct(pop_summary=pop_summary, mrct_sims=mrct_sims,showtable=FALSE, 
                        sg_type=2, tab_caption=c("Identified subgroups and estimation summaries under uniform treatment effects: 2-factor combinations"))

ahr_true <- c(temp$res$ahr_true)
ahr_unadj <- c(temp$res$ahr_unadj)
ahr_sR <- c(temp$res$ahr_sR)
ahr_sRw <- c(temp$res$ahr_sRw)
ahr_w1 <- c(temp$res$ahr_w1)
  • The underlying true ITT causal hazard ratio is 0.7
    • The large-sample (single draw) approximation to ITT un-adjusted is 0.7619
    • The large-sample ~ to ITT stratified is 0.706
    • The large-sample ~ to ITT stratified + x-adjustment is 0.7056
    • The large-sample ~ to AP region analysis is 0.7173

The false-positive rate (identifying a subgroup under uniform benefit)

  • Across 1,000 simulations (a subgroup was falsely identified) = 7.6%

Summary of subgroups identified and estimation properties are:

Code
# Show the table
temp$out_table
Identified subgroups and estimation summaries under uniform treatment effects: 2-factor combinations
Overall
(N=1000)
hr_itt
Mean (SD) 0.708 (0.0689)
Median [Min, Max] 0.704 [0.491, 0.996]
hr_ittX
Mean (SD) 0.708 (0.0675)
Median [Min, Max] 0.705 [0.493, 1.01]
hr_test
Mean (SD) 0.733 (0.208)
Median [Min, Max] 0.712 [0.271, 1.80]
found
0 924 (92.4%)
1 76 (7.6%)
prev_sg
Mean (SD) 0.986 (0.0642)
Median [Min, Max] 1.00 [0.360, 1.00]
hr_sg
Mean (SD) 0.733 (0.209)
Median [Min, Max] 0.711 [0.271, 1.80]
POhr_sg
Mean (SD) 0.700 (0)
Median [Min, Max] 0.700 [0.700, 0.700]
sg_biomarker
biomarker 41 (4.1%)
none 924 (92.4%)
other 35 (3.5%)
sg_age
age 48 (4.8%)
none 924 (92.4%)
other 28 (2.8%)
sg_male
male 10 (1.0%)
none 924 (92.4%)
other 66 (6.6%)
sg_ecog1
ecog1 9 (0.9%)
none 924 (92.4%)
other 67 (6.7%)
sg_histology
histology 7 (0.7%)
none 924 (92.4%)
other 69 (6.9%)
sg_CTregimen
mAD_CT 11 (1.1%)
none 924 (92.4%)
other 65 (6.5%)
sg_EU
EU region 13 (1.3%)
none 924 (92.4%)
other 63 (6.3%)
sg_surgery
none 924 (92.4%)
other 67 (6.7%)
surgery 9 (0.9%)
reldelta
Mean (SD) 0.0373 (4.15)
Median [Min, Max] 0 [-46.4, 54.3]
APflag
AP <=0.9 796 (79.6%)
AP > 0.9 204 (20.4%)
sg_le85
AP(sg)<=0.85 743 (74.3%)
AP(sg)>0.85 257 (25.7%)
APflag2
AP > 0.9 & AP(sg) <= 0.85 3 (0.3%)
Not 997 (99.7%)
APflag3
AP > 0.9 & AP(sg) <= 0.80 2 (0.2%)
Not 998 (99.8%)
hr_sg_null
Mean (SD) 0.736 (0.226)
Median [Min, Max] 0.717 [0.303, 1.19]
Missing 924 (92.4%)
Code
temp <- SGplot_estimates(df=mrct_sims, label_training="Non-AP, ITT", label_itt="Overall, ITT", label_testing="AP, ITT", label_sg="AP, identified subgroup")
plot(temp$plot_estimates)

6.4 Strong biomarker effect (prognostic effect bw=-log5) maxk=2

Simulations under strong biomarker effect and stronger prognostic effect allowing for 2-factor combination (maxk=2)

Code
t.start<-proc.time()[3]

# DGM 
#log.hrs <- log(c(2,1.0,0.60))
log.hrs <- log(c(3,1.25,0.50))

dgm <- get_dgm_stratified(df=df.case,zeta=10,knot=5,log.hrs=log.hrs,strata_tte=NULL,details=TRUE)

pop_summary <- draw_sim_stratified(dgm=dgm,ss=1,Ndraw=100000,wname="AP_region",bw=-log(5),
                                   strata_rand="stratum",checking=FALSE,details=FALSE,return_df=FALSE)

# True causal AHR
ahr_true = c(round(pop_summary$AHR,4))
# AHR Cox un-adjusted (only treatment)
ahr_unadj = c(round(pop_summary$ITT_unadj,4))
# Treatment plus randomization stratificaton sR
ahr_sR = c(round(pop_summary$ITT_sR,4))
# sR including adjustment by x
ahr_sRw = c(round(pop_summary$ITT_sRw,4))
# Relative biases 
bias_unadj <-round(100*c(pop_summary$ITT_unadj-pop_summary$AHR)/pop_summary$AHR,1)
bias_sR <-round(100*c(pop_summary$ITT_sR-pop_summary$AHR)/pop_summary$AHR,1)
bias_sRw <-round(100*c(pop_summary$ITT_sRw-pop_summary$AHR)/pop_summary$AHR,1)
# The bias within X=1 population
ahr_w1 = c(round(pop_summary$W_1,4))
bias_w1 <-round(100*c(pop_summary$W_1-pop_summary$AHR)/pop_summary$AHR,1)

cat("AHR true (causal) and within AP:",c(ahr_true,ahr_w1),"\n")

n_sims <- nsims

library(doFuture)
library(doRNG)
registerDoFuture()
registerDoRNG()

mrct_sims <- mrct_APregion_sims(dgm=dgm,n_sims=n_sims,wname="AP_region",bw=-log(5),
sg_focus="hr",maxk=2, hr.threshold=1.25, hr.consistency=1.0, details=TRUE)

t.now<-proc.time()[3]
t.min<-(t.now-t.start)/60
cat("Minutes (doFuture) simulations",c(t.min),"\n")

cat("How often was a subgroup identified=",c(mean(mrct_sims$any_found)),"\n")

if(output) save(n_sims,pop_summary,dgm,mrct_sims,file="output/mrct_sims3_k2_v0.Rdata")
Code
if(loadit) load(file="output/mrct_sims3_k2_v0.Rdata")
true_pos <- round(100*c(mean(mrct_sims$any_found)),4)

cat("What do a few non-null look like","\n")
What do a few non-null look like 
Code
head(mrct_sims$sg_found[which(mrct_sims$any_found==1)])
[1] "{biomarker < 2} & {age <= 68}"   "{biomarker < 2} & !{age <= 52}" 
[3] "{age <= 61} & {biomarker < 4}"   "{age <= 59.6} & {biomarker < 4}"
[5] "{biomarker < 5} & !{age <= 61}"  "{biomarker < 6} & !{ecog1}"     
Code
# Note, here too many individual subgroup cuts, set sg_type=2
temp <- summaryout_mrct(pop_summary=pop_summary, mrct_sims=mrct_sims,showtable=FALSE, sg_type=2, 
                        tab_caption=c("Identified subgroups and estimation summaries under biomarker effects: 2-factor combinations"))

ahr_true <- c(temp$res$ahr_true)
ahr_unadj <- c(temp$res$ahr_unadj)
ahr_sR <- c(temp$res$ahr_sR)
ahr_sRw <- c(temp$res$ahr_sRw)
ahr_w1 <- c(temp$res$ahr_w1)
  • The underlying true ITT causal hazard ratio is 0.7368
    • The large-sample (single draw) approximation to ITT un-adjusted is 0.7904
    • The large-sample ~ to ITT stratified is 0.7641
    • The large-sample ~ to ITT stratified + x-adjustment is 0.7458
    • The large-sample ~ to AP region analysis is 1.016

The true-positive rate (identifying a subgroup under strong biomarker effects)

  • Across 1,000 simulations (a subgroup was identified) = 100%

Summary of subgroups identified and estimation properties are:

Code
# Show the table
temp$out_table
Identified subgroups and estimation summaries under biomarker effects: 2-factor combinations
Overall
(N=1000)
hr_itt
Mean (SD) 0.778 (0.0683)
Median [Min, Max] 0.775 [0.564, 1.03]
hr_ittX
Mean (SD) 0.764 (0.0662)
Median [Min, Max] 0.763 [0.555, 1.01]
hr_test
Mean (SD) 1.05 (0.266)
Median [Min, Max] 1.03 [0.431, 2.50]
found
1 1000 (100%)
prev_sg
Mean (SD) 0.837 (0.104)
Median [Min, Max] 0.856 [0.384, 1.00]
hr_sg
Mean (SD) 0.875 (0.278)
Median [Min, Max] 0.846 [0.223, 2.13]
POhr_sg
Mean (SD) 0.578 (0.0982)
Median [Min, Max] 0.593 [0.185, 0.730]
sg_biomarker
biomarker 1000 (100%)
sg_age
age 421 (42.1%)
other 579 (57.9%)
sg_male
male 85 (8.5%)
other 915 (91.5%)
sg_ecog1
ecog1 46 (4.6%)
other 954 (95.4%)
sg_histology
histology 60 (6.0%)
other 940 (94.0%)
sg_CTregimen
mAD_CT 173 (17.3%)
other 827 (82.7%)
sg_EU
EU region 130 (13.0%)
other 870 (87.0%)
sg_surgery
other 919 (91.9%)
surgery 81 (8.1%)
reldelta
Mean (SD) 16.7 (14.7)
Median [Min, Max] 15.2 [-13.7, 71.2]
APflag
AP <=0.9 313 (31.3%)
AP > 0.9 687 (68.7%)
sg_le85
AP(sg)<=0.85 505 (50.5%)
AP(sg)>0.85 495 (49.5%)
APflag2
AP > 0.9 & AP(sg) <= 0.85 214 (21.4%)
Not 786 (78.6%)
APflag3
AP > 0.9 & AP(sg) <= 0.80 159 (15.9%)
Not 841 (84.1%)
hr_sg_null
Mean (SD) 0.875 (0.278)
Median [Min, Max] 0.846 [0.223, 2.13]
Code
temp <- SGplot_estimates(df=mrct_sims, label_training="Non-AP, ITT", label_itt="Overall, ITT", label_testing="AP, ITT", label_sg="AP, identified subgroup")
plot(temp$plot_estimates)

6.5 Strong biomarker effect (prognostic effect \(\beta_{w}=0\))

Simulations under strong biomarker effect but null prognostic effect

Code
# Run sims and store results
t.start<-proc.time()[3]
# DGM 
#log.hrs <- log(c(2.5,1.25,0.50))
# example 2
log.hrs <- log(c(3,1.25,0.50))

dgm <- get_dgm_stratified(df=df.case,zeta=10,knot=5,log.hrs=log.hrs,strata_tte=NULL,details=TRUE)

pop_summary <- draw_sim_stratified(dgm=dgm,ss=1,Ndraw=100000,wname="AP_region",
                                   bw=0.0,strata_rand="stratum",checking=FALSE,details=FALSE,return_df=FALSE)

# True causal AHR
ahr_true = c(round(pop_summary$AHR,4))
# AHR Cox un-adjusted (only treatment)
ahr_unadj = c(round(pop_summary$ITT_unadj,4))
# Treatment plus randomization stratificaton sR
ahr_sR = c(round(pop_summary$ITT_sR,4))
# sR including adjustment by x
ahr_sRw = c(round(pop_summary$ITT_sRw,4))
# Relative biases 
bias_unadj <-round(100*c(pop_summary$ITT_unadj-pop_summary$AHR)/pop_summary$AHR,1)
bias_sR <-round(100*c(pop_summary$ITT_sR-pop_summary$AHR)/pop_summary$AHR,1)
bias_sRw <-round(100*c(pop_summary$ITT_sRw-pop_summary$AHR)/pop_summary$AHR,1)
# The bias within W=1 population
ahr_w1 = c(round(pop_summary$W_1,4))
bias_w1 <-round(100*c(pop_summary$W_1-pop_summary$AHR)/pop_summary$AHR,1)

cat("AHR true (causal) and within AP:",c(ahr_true,ahr_w1),"\n")

n_sims <- nsims

library(doFuture)
library(doRNG)
registerDoFuture()
registerDoRNG()

mrct_sims <- mrct_APregion_sims(dgm=dgm,n_sims=n_sims,wname="AP_region",
                                bw=0.0,sg_focus="minSG",details=TRUE)

t.now<-proc.time()[3]
t.min<-(t.now-t.start)/60
cat("Minutes (doFuture) simulations",c(t.min),"\n")

cat("How often was a subgroup identified=",c(mean(mrct_sims$any_found)),"\n")

if(output) save(n_sims,pop_summary,dgm,mrct_sims,file="output/mrct_sims3Wnull_v0.Rdata")
Code
if(loadit) load(file="output/mrct_sims3Wnull_v0.Rdata")

# Look at large-sample spline in non-AP
dflarge <- draw_sim_stratified(dgm=dgm,ss=1,Ndraw=10000,wname="AP_region",bw=0.0,
                               strata_rand="stratum",checking=FALSE,details=FALSE,return_df=TRUE)

temp <- cox_cs_fit(df=subset(dflarge,AP_region==0),tte.name="y.sim",event.name="event.sim",treat.name="treat.sim",
strata.name="strata.simO",z.name=c("z"), details=FALSE ,boots=0, xlab=c("Biomarker"),maxz = 25, ydel=0.5, cex_legend=0.5, show_plot=TRUE,truebeta.name="loghr.po",
main_title=c("Non-AP large sample ~ training estimation properties"))

Code
true_pos <- round(100*c(mean(mrct_sims$any_found)),4)

temp <- summaryout_mrct(pop_summary=pop_summary, mrct_sims=mrct_sims,showtable=FALSE)

ahr_true <- c(temp$res$ahr_true)
ahr_unadj <- c(temp$res$ahr_unadj)
ahr_sR <- c(temp$res$ahr_sR)
ahr_sRw <- c(temp$res$ahr_sRw)
ahr_w1 <- c(temp$res$ahr_w1)
  • The underlying true ITT causal hazard ratio is 0.7368
    • The large-sample (single draw) approximation to ITT un-adjusted is 0.7157
    • The large-sample ~ to ITT stratified is 0.7196
    • The large-sample ~ to ITT stratified + x-adjustment is 0.7156
    • The large-sample ~ to AP region analysis is 0.7063

The true-positive rate (identifying a subgroup under HTE (heterogeneous treatment effects))

  • Across 1,000 simulations (a subgroup was identified) = 100%

Summary of subgroups identified and estimation properties are:

Code
# Show the table
temp$out_table
Identified subgroups and estimation summaries under biomarker effects: 1-factor combinations
Overall
(N=1000)
hr_itt
Mean (SD) 0.734 (0.0595)
Median [Min, Max] 0.732 [0.530, 0.946]
hr_ittX
Mean (SD) 0.721 (0.0578)
Median [Min, Max] 0.718 [0.530, 0.928]
hr_test
Mean (SD) 0.724 (0.122)
Median [Min, Max] 0.715 [0.421, 1.25]
found
1 1000 (100%)
prev_sg
Mean (SD) 0.800 (0.00613)
Median [Min, Max] 0.800 [0.688, 0.800]
hr_sg
Mean (SD) 0.577 (0.115)
Median [Min, Max] 0.570 [0.302, 1.11]
POhr_sg
Mean (SD) 0.536 (0.00878)
Median [Min, Max] 0.536 [0.536, 0.696]
sg_found
!{age <= 68} 3 (0.3%)
{biomarker < 2} 997 (99.7%)
sg_biomarker
biomarker 997 (99.7%)
other 3 (0.3%)
sg_age
age 3 (0.3%)
other 997 (99.7%)
reldelta
Mean (SD) 20.5 (6.48)
Median [Min, Max] 20.7 [1.42, 41.7]
APflag
AP <=0.9 912 (91.2%)
AP > 0.9 88 (8.8%)
sg_le85
AP(sg)<=0.85 980 (98.0%)
AP(sg)>0.85 20 (2.0%)
APflag2
AP > 0.9 & AP(sg) <= 0.85 68 (6.8%)
Not 932 (93.2%)
APflag3
AP > 0.9 & AP(sg) <= 0.80 51 (5.1%)
Not 949 (94.9%)
hr_sg_null
Mean (SD) 0.577 (0.115)
Median [Min, Max] 0.570 [0.302, 1.11]
Code
temp <- SGplot_estimates(df=mrct_sims, label_training="Non-AP, ITT", label_itt="Overall, ITT", label_testing="AP, ITT", label_sg="AP, identified subgroup")
plot(temp$plot_estimates)

7 Appendix

7.1 Brief Overview of Methods

  • forestSearch (León et al. (2024)):
    • Subgroups (SGs) with HR estimates indicative of “sub-optimal benefit” (e.g., \(hr \geq 1.0\) threshold), are considered candidates with a “splitting consistency criteria” for selection1.
      • In essence, if a subgroup that appears to derive the least benefit is identified, then the complement may potentially be considered to derive benefit with a “higher degree of confidence” relative to the ITT population
    • SGs are formed by single factors (eg, SG1 = males, SG2 = age<65) and two-way combinations (e.g., SG3 = males & age<65)
    • SGs with a minimum size of \(60\) and with a minimum of number of 10 events in each treatment arm will be considered
    • Reversing the role of treatment allows for identifying subgroups with “enhanced benefit” (e.g., \(hr \leq 0.65\) threshold)
    • Continuous factors are cut at either medians (q2), or quartiles (q1, q2, q3, say)
    • Cuts are also identified by generalized random forests (GRF) which is an identification approach itself –
  • Generalized random forests (Athey, Tibshirani, and Wager (2019), Cui et al. (2023)) which is based on restricted mean survival time (RMST) comparisons as implemented in the R package Tibshirani et al. (2022):
    • An RMST benefit of (at least) \(3\) months for control is required for selection of a subgroup, where among tree depths of 1 and 2 the subgroup with the largest RMST benefit (\(\geq 3\) months in favor of control) is selected
    • Similar to forestSearch, we are targeting relatively large effects
    • SGs with sample sizes of at least \(60\) participants and a maximum tree depth of 2 is required

7.2 Example of bootstrap adjusted estimates and CIs

Example of bootstrap de-biased hazard-ratio (point) and CI estimation. Note that this is relevant for inference with regard to the training population – That is, we are interested in valid estimation when based on the same population from which subgroups were identified. For example, if the AP data will also be used for identifying the subgroups then we would want to account for the algorithm used in identification. Here we illustrate with the non-AP data.

Code
fs_minSG <- forestsearch(df.analysis=df_nonAP, confounders.name=confounders.name,
outcome.name="y.sim", treat.name="treat.sim", event.name="event.sim",
potentialOutcome.name="loghr.po",
hr.threshold = 0.90, hr.consistency = 0.90, pconsistency.threshold = 0.90,
showten_subgroups=FALSE, details=FALSE, sg_focus="minSG",
conf_force=c("age <= 65","biomarker < 1","biomarker < 2","biomarker < 3",
"biomarker < 4","biomarker < 5","biomarker < 6","biomarker < 7","biomarker < 8",
"biomarker < 9","biomarker < 10"), exclude_cuts=c("biomarker <="), maxk=1, plot.sg=FALSE)

library(doFuture)
library(doRNG)
registerDoFuture()
registerDoRNG()

t.start<-proc.time()[3]

NB <- nboots

# Bootstrap bias-correction 
fs_bc <- forestsearch_bootstrap_dofuture(fs.est=fs_minSG,nb_boots=NB,show_three=TRUE,details=TRUE)
t.now<-proc.time()[3]
t.min<-(t.now-t.start)/60
cat("Minutes (doFuture) bootstrap",c(t.min),"\n")


if(output) save(fs_minSG, fs_bc, NB, file="output/boot3_minSG_k1_v0.Rdata")
Code
if(loadit) load(file="output/boot3_minSG_k1_v0.Rdata")

gt(as.data.frame(fs_bc$FSsg_tab), auto_align=FALSE) 
Subgroup n n1 events m1 m0 RMST HR (95% CI) HR* AHR(po)
Questionable 95 (25%) 46 (48%) 94 (99%) 6.2 8.2 -3.77 (-7.14,-0.39) 1.66 (1.09,2.52) 1.59 (0.92,2.77) 2.52
Recommend 287 (75%) 143 (50%) 249 (87%) 16 11.8 7.28 (3.68,10.87) 0.55 (0.42,0.71) 0.55 (0.38,0.8) 0.49
Code
dfnew <- copy(df_nonAP)
dfnew$control.sim <- 1-dfnew$treat.sim

fs_maxSG <- forestsearch(df.analysis=dfnew, confounders.name=confounders.name,
outcome.name="y.sim", treat.name="control.sim", event.name="event.sim",
potentialOutcome.name="loghr.po",
est.scale="1/hr", hr.threshold = 1/0.6, hr.consistency = 1/0.7, pconsistency.threshold = 0.90,
showten_subgroups=FALSE, details=TRUE, sg_focus="maxSG",
conf_force=c("age <= 65","biomarker < 1","biomarker < 2","biomarker < 3",
"biomarker < 4","biomarker < 5","biomarker < 6","biomarker < 7","biomarker < 8",
"biomarker < 9","biomarker < 10"), exclude_cuts=c("biomarker <="), maxk=1, plot.sg=FALSE)

library(doFuture)
library(doRNG)
registerDoFuture()
registerDoRNG()

t.start<-proc.time()[3]

NB <- nboots

# Bootstrap bias-correction 
fs_bc <- forestsearch_bootstrap_dofuture(fs.est=fs_maxSG,nb_boots=NB,show_three=TRUE,details=TRUE)
t.now<-proc.time()[3]
t.min<-(t.now-t.start)/60
cat("Minutes (doFuture) bootstrap",c(t.min),"\n")

if(output) save(fs_maxSG, fs_bc, NB, file="output/boot3_maxSG_k1_v0.Rdata")
Code
if(loadit) load(file="output/boot3_maxSG_k1_v0.Rdata")

gt(as.data.frame(fs_bc$FSsg_tab), auto_align=FALSE) 
Subgroup n n1 events m1 m0 RMST HR (95% CI) HR* AHR(po)
Questionable 95 (25%) 46 (48%) 94 (99%) 6.2 8.2 -3.77 (-7.14,-0.39) 1.66 (1.09,2.52) 1.6 (0.93,2.74) 2.52
Recommend 287 (75%) 143 (50%) 249 (87%) 16 11.8 7.28 (3.68,10.87) 0.55 (0.42,0.71) 0.56 (0.38,0.8) 0.49
Code
tALL.now<-proc.time()[3]
t.min<-(tALL.now-tALL.start)/60
cat("Minutes for ALL Analyses",c(t.min),"\n")
Minutes for ALL Analyses 0.2235167 

References

Aalen, O. O., R. J. Cook, and K. Røysland. 2015. “Does Cox Analysis of a Randomized Survival Study Yield a Causal Treatment Effect?” Lifetime Data Analysis, 579–93.
Athey, Susan, Julie Tibshirani, and Stefan Wager. 2019. “Generalized Random Forests.” The Annals of Statistics 47 (2): 1148–78.
Cui, Yifan, Michael R Kosorok, Erik Sverdrup, Stefan Wager, and Ruoqing Zhu. 2023. Estimating heterogeneous treatment effects with right-censored data via causal survival forests.” Journal of the Royal Statistical Society Series B: Statistical Methodology, February.
León, Larry F., Thomas Jemielita, Zifang Guo, Rachel Marceau West, and Keaven M. Anderson. 2024. “Exploratory Subgroup Identification in the Heterogeneous Cox Model: A Relatively Simple Procedure.” Statistics in Medicine 43 (20): 3921–42. https://doi.org/https://doi.org/10.1002/sim.10163.
Tibshirani, Julie, Susan Athey, Rina Friedberg, Vitor Hadad, David Hirshberg, Luke Miner, Erik Sverdrup, Stefan Wager, and Marvin Wright. 2022. “Package Grf.”

Footnotes

  1. “splitting consistency criteria” judges the robustness/realness↩︎